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1 Introduction 

Statistical mechanics deals with large systems of stochastically interacting microscopic elements (parti- 
cles, atomic magnets, polymers, etc.). The strategy of statistical mechanics is to abandon any ambition 
to solve models of such systems at the microscopic level of individual elements, but to use the mi- 
croscopic laws to calculate equations describing the behaviour of a suitably chosen set of macroscopic 
observables. The toolbox of statistical mechanics consists of methods to perform this reduction from 
the microscopic to a macroscopic level, which are all based on efficient ways to do the bookkeeping of 
probabilities. The experience and intuition that has been built up over the last century tells us what 
to expect, and serves as a guide in finding the macroscopic observables and in seeing the difference 
between relevant mathematical subtleties and irrelevant ones. As in any statistical theory, clean and 
transparent mathematical laws can be expected to emerge only for large (preferably infinitely large) 
systems. In this limit one often encounters phase transitions, i.e. drastic changes in the system's 
macroscopic behaviour at specific values of global control parameters. 

Recurrent neural networks, i.e. neural networks with synaptic feedback loops, appear to meet the 
criteria for statistical mechanics to apply, provided we indeed restrict ourselves to large systems. Here 
the microscopic stochastic dynamical variables are the firing states of the neurons or their membrane 
potentials, and one is mostly interested in quantities such as average state correlations and global 
information processing quality, which are indeed measured by macroscopic observables. In contrast 
to layered networks, one cannot simply write down the values of successive neuron states for models 
of recurrent neural networks; here they must be solved from (mostly stochastic) coupled dynamic 
equations. Under special conditions ('detailed balance'), which usually translate into the requirement 
of synaptic symmetry, the stochastic process of evolving neuron states leads towards an equilibrium 
situation where the microscopic state probabilities are known, and where the techniques of equilib- 
rium statistical mechanics can be applied in one form or another. The equilibrium distribution found, 
however, will not always be of the conventional Boltzmann form. For non-symmetric networks, where 
the asymptotic (stationary) statistics are not known, dynamical techniques from non- equilibrium sta- 
tistical mechanics are the only tools available for analysis. The 'natural' set of macroscopic quantities 
(or 'order parameters') to be calculated can be defined in practice as the smallest set which will obey 
closed deterministic equations in the limit of an infinitely large network. 

Being high-dimensional non-linear systems with extensive feedback, the dynamics of recurrent 
neural networks are generally dominated by a wealth of attractors (fixed-point attractors, limit-cycles, 
or even more exotic types), and the practical use of recurrent neural networks (in both biology and 
engineering) lies in the potential for creation and manipulation of these attractors through adaptation 
of the network parameters (synapses and thresholds). Input fed into a recurrent neural network usually 
serves to induce a specific initial configuration (or firing pattern) of the neurons, which serves as a cue, 
and the 'output' is given by the (static or dynamic) attractor which has been triggered by this cue. The 
most familiar types of recurrent neural network models, where the idea of creating and manipulating 
attractors has been worked out and applied explicitly, are the so-called attractor neural networks for 
associative memory, designed to store and retrieve information in the form of neuronal firing patterns 
and/or sequences of neuronal firing patterns. Each pattern to be stored is represented as a microscopic 
state vector. One then constructs synapses and thresholds such that the dominant attractors of the 
network are precisely the pattern vectors (in the case of static recall), or where, alternatively, they 
are trajectories in which the patterns are successively generated microscopic system states. From an 
initial configuration (the 'cue', or input pattern to be recognised) the system is allowed to evolve in 
time autonomously, and the final state (or trajectory) reached can be interpreted as the pattern (or 
pattern sequence) recognized by network from the input (see figure [l]) . For such programmes to work 
one clearly needs recurrent neural networks with extensive 'ergodicity breaking': the state vector will 



1 INTRODUCTION 



3 




Figure 1: Information processing by recurrent neural networks through the creation and manipulation 
of attractors in state space. Patterns stored: the microscopic states •. If the synapses are symmetric we 
will generally find that the attractors will have to be fixed-points (left picture). With non-symmetric 
synapses, the attractors can also be sequences of microscopic states (right picture). 

during the course of the dynamics (at least on finite time-scales) have to be confined to a restricted 
region of state space (an 'ergodic component'), the location of which is to depend strongly on the 
initial conditions. Hence our interest will mainly be in systems with many attractors. This, in turn, 
has implications at a theoretical/mathematical level: solving models of recurrent neural networks 
with extensively many attractors requires advanced tools from disordered systems theory, such as 
replica theory (statics) and generating functional analysis (dynamics). It will turn out that a crucial 
issue is whether or not the synapses are symmetric. Firstly, synaptic asymmetry is found to rule out 
microscopic equilibrium, which has implications for the mathematical techniques which are available: 
studying models of recurrent networks with non-symmetric synapses requires solving the dynamics, 
even if one is only interested in the stationary state. Secondly, the degree of synaptic asymmetry turns 
out to be a deciding factor in determining to what extent the dynamics will be glassy, i.e. extremely 
slow and non-trivial, close to saturation (where one has an extensive number of attractors). 

In this paper (on statics) and its sequel (on dynamics) I will discuss only the statistical mechanical 
analysis of neuronal firing processes in recurrent networks with static synapses, i.e. network operation 
as opposed to network learning. I will also restrict myself to networks with either full or randomly 
diluted connectivity, the area in which the main progress has been made during the last few decades. 
Apart from these restrictions, the text aims to be reasonably comprehensive and self-contained. Even 
within the confined area of the operation of recurrent neural networks a truly impressive amount has 
been achieved, and many of the constraints on mathematical models which were once thought to be 
essential for retaining solvability but which were regrettable from a biological point of view (such as 
synaptic symmetry, binary neuron states, instantaneous neuronal communication, a small number of 
attractors, etc.) have by now been removed with success. At the beginning of the new millennium 
we know much more about the dynamics and statics of recurrent neural networks than ever before. 
I aim to cover in a more or less unified manner the most important models and techniques which 



2 DEFINITIONS & PROPERTIES OF MICROSCOPIC LAWS 



4 



have been launched over the years, ranging from simple symmetric and non-symmetric networks with 
only a finite number of attractors, to the more complicated ones with an extensive number, and I will 
explain in detail the techniques which have been designed and used to solve them. 

In the present paper I will first discuss and solve various members of the simplest class of models: 
those where all synapses are the same. Then I turn to the Hopfield model, which is the archetypical 
model to describe the functioning of symmetric neural networks as associative memories (away from 
saturation, where the number of attractors is finite), and to a coupled oscillator model storing phase 
patterns (again away from saturation). Next I will discuss a model with Gaussian synapses, where 
the number of attractors diverges, in order to introduce the so-called replica method, followed by 
a section on the solution of the Hopfield model near saturation. I close this paper with a guide to 
further references and an assessment of the past and future deliverables of the equilibrium statistical 
mechanical analysis of recurrent neural networks. 



2 Definitions Sz Properties of Microscopic Laws 

In this section I define the most common microscopic models for recurrent neural networks, I show 
how one can derive the corresponding descriptions of the stochastic evolution in terms of evolving 
state probabilities, and I discuss some fundamental statistical mechanical properties. 



2.1 Stochastic Dynamics of Neuronal Firing States 

Microscopic Definitions for Binary Neurons. The simplest non-trivial definition of a recurrent neural 
network is that where A" binary neurons o~i G {—1, 1} (in which the states '1' and '-1' represent firing 
and rest, respectively) respond iteratively and synchronously to post-synaptic potentials (or local 
fields) hi(cr), with a = (<ti, . . . , <tjv)- The fields are assumed to depend linearly on the instantaneous 
neuron states: 

Parallel : a % {£+\) = sgn [hi(<r(£)) + T Vi (£)} hi(a) = £ Jyaj + B % (1) 

3 

The stochasticity is in the independent random numbers r)i(t) G K (representing threshold noise), 
which are all drawn according to some distribution w{rj). The parameter T is introduced to control 
the amount of noise. For T = the process (|l|) is deterministic: ai(£+l) = sgn[hi(a (£))]. The opposite 
extreme is choosing T = oo, here the system evolution is fully random. The external fields 6% represent 
neural thresholds and/or external stimuli, Jij represents the synaptic efficacy at the junction j —* i 
(Jij > implies excitation, < inhibition). Alternatively we could decide that at each iteration 
step £ only a single randomly drawn neuron o~i t is to undergo an update of the type (|l|): 

i + ir- <Ti(t+l) = o-i{£) 
Sequential : . = ^ . = ggn [h . {(T{l)) + j^)] (2) 

with the local fields as in (||) . The stochasticity is now both in the independent random numbers rji (£) 
(the threshold noise) and in the site ig to be updated, drawn randomly from the set {1, . . . , A^}. For 
simplicity we assume w(—rj) = w{rf), and define 

f z d 
g[z] = 2 drj w(rj) : g[-z] = -g[z], lim g[z] = ±1, -rg[z] > 

Jo «-»±oo dz 

Popular choices for the threshold noise distributions are 
10(77) = (2vr) _ 5 e -5 ? ? 2 : g[ z ] = Eri[z/V2], 10(77) = ^[!-tanh 2 (r/)] : g[z] = tanh(z) 
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From Stochastic Equations to Evolving Probabilities. From the microscopic equations (|l],Q), which are 
suitable for numerical simulations, we can derive an equivalent but mathematically more convenient 
description in terms of microscopic state probabilities pi(cr). Equations (|l],Q) state that, if the system 
state <t(£) is given, a neuron i to be updated will obey 

Prob [o-i(£+l)} = i [1 + g^h^ail))]] (3) 

with (3 = T _1 . In the case (|l]) this rule applies to all neurons, and thus we simply get p^ + i{a) = 
\[f = l \ [l+c* g\flh>i((r(l))]]. If, on the other hand, instead of cr(£) only the probability distribution 
Pi(cr) is given, this expression for p£ + i(cr) is to be averaged over the possible states at time £: 

N 1 

Parallel : Pe+1 (a) = E W [cr; a'] p £ (a') W [cr; a'] = J[ - [1 + a { g[/3hi(a')]] (4) 

<X' i=l 

This is the standard representation of a Markov chain. Also the sequential process (§) can be formu- 
lated in terms of probabilities, but here expression (]3|) applies only to the randomly drawn candidate 
t£. After averaging over all possible realisations of the sites ii we obtain: 



N i 



+ o-i g[(3hi(a (£))]} 



(with the Kronecker symbol: 5ij = 1 if i = j, 5ij = otherwise). If, instead of cr(£), the probabilities 
Pi(cr) are given, this expression is to be averaged over the possible states at time £, with the result: 

Vl+xW) = ^E^t 1 + ^ g[Phi{a)]]p e Xa) + ^E^t 1 + °i giPh^a)}} p e (F i( r) 

i i 

with the state-flip operators F^(a) = ^(a%, . . . , o"j_i,-<Tj, o*i+i, . . . , o~n)- This equation can again be 
written in the standard form pi + i(a) = J2cr' W [cr; cr']pi(a'), but now with the transition matrix 

Sequential: W [cr; cr'] = 5^^' + E { w i( F i (T ) S (T,F 1 cr' ~ Wi{c^)^a,a'} (5) 

i 

where 5 a ,cr' = Yli^,^ and 

Wi(cr) = -[l-o-ita,nh[f3hi(cr)]] (6) 

Note that, as soon as T > 0, the two transition matrices V7[cr;cr'] in (||,|5|) both describe ergodic 
systems: from any initial state cr' one can reach any final state cr with nonzero probability in a finite 
number of steps (being one in the parallel case, and iV in the sequential case). It now follows from the 
standard theory of stochastic processes (see e.g. |2|) that in both cases the system evolves towards 
a unique stationary distribution p^ (cr), where all probabilities Poo(c) are non-zero. 

From Discrete to Continuous Times. The above processes have the (mathematically and biologically) 
less appealing property that time is measured in discrete units. For the sequential case we will now 
assume that the duration of each of the iteration steps is a continuous random number (for parallel 
dynamics this would make little sense, since all updates would still be made in full synchrony). The 
statistics of the durations are described by a function 7r^(i), defined as the probability that at time t 
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precisely t updates have been made. Upon denoting the previous discrete-time probabilities as Pi(cr), 
our new process (which now includes the randomness in step duration) will be described by 

p t (a) = £ 7r e (t)p e (a) = £ 7r e (t) £ W l [a; a'] p (a') 

e>o e>o a' 

and time has become a continuous variable. For ng(t) we make the Poisson choice irg(t) = 4 (-^)^e~*/ A . 
From (£) n = t/A and {£ 2 } n = t/A + t 2 /A 2 it follows that A is the average duration of an iteration step, 
and that the relative deviation in I at a given t vanishes for A — > as y/ (£ 2 ) w — (£) 2 /(£) n = \/ A/t. 
The nice properties of the Poisson distribution under temporal derivation allow us to derive: 

a 1 

For sequential dynamics we choose A = so that, as in the parallel case, in one time unit each neuron 
will on average be updated once. The master equation corresponding to (||) acquires the form 

^t(°") = Yl {wi{Ficr)p t (Ficr) - Wi(cr)p t (cr)} (7) 

i 

The Wi(tr) (||) now play the role of transition rates. The choice A = -i implies \/ (l 2 )^ — (t) 2 / = 
\/l/Nt, so we will still for N — > oo no longer have uncertainty in where we are on the t axis. 

Microscopic Definitions for Continuous Neurons. Alternatively, we could start with continuous neu- 
ronal variables tTj (representing e.g. firing frequencies or oscillator phases), where i = 1, ... ,N, and 
with stochastic equations of the form 

a^t+A) = ai(t) + Afi{a{t)) + V^TA^t) (8) 

Here we have introduced (as yet unspecified) deterministic state-dependent forces fi(er), and uncorre- 
cted Gaussian distributed random forces (the noise), with = and = hj^t,v- 
As before, the parameter T controls the amount of noise in the system, ranging from T = (deter- 
ministic dynamics) to T = oo (completely random dynamics). If we take the limit A — > in (|8|) we 
find a Langevin equation (with a continuous time variable): 

±a l (t) = f i {a(t))+ m (t) (9) 

This equation acquires its meaning only as the limit A — > of (||). The moments of the new noise 
variables rn(t) = Ci(t)y/2T/A in © are given by = and (Vi(t)Vj(t')} = 2T8 ij 5(t-t'). This can 

be derived from the moments of the For instance: 



2T 1 



The constant C is found by summing over i', before taking the limit A —* 0, in the above equation: 

„ oo oo 

y eft' (Vi(t) Vj ^)) = lim 2T ^ = 2I %|™ E V = 2T ^ 

t'=— oo t'=— oo 

Thus C = 1, which indeed implies {f]i{t)rjj{t')) = 2T6ijS(t—t'). More directly, one can also calculate 
the moment generating function 



( e i/*Ei^(*)»h(*)\ = lim TT e -|^+^i(t)V2TA = Hm TT e -TA^(t) = g-T (10) 



2 DEFINITIONS & PROPERTIES OF MICROSCOPIC LAWS 



7 



From Stochastic Equations to Evolving Probabilities. A mathematically more convenient description 
of the process (||) is provided by the Fokker-Planck equation for the microscopic state probability 
density pt(<x) = {5[cr — cr (t)]) , which we will now derive. For the discrete-time process (Jsj) we expand 
the ^-distribution in the definition of pt+ A {(r) (in a distributional sense): 

p t+A (a) - p t (a) = (5 [a-a(t)-Af(a(t))-V2TA^(t)})- (5[tr-tr(t)]) 

i ' ij i 3 

The variables <r(t) depend only on noise variables £j(t') with t' < t, so that for any function A: 
(A[tr(t)]&(t)) = (A[a(t)})(Zi(t)) = 0, and (A[tr(t)]^(t)^(t)) = 5ij{A[a{t)]). As a consequence: 

1 [p t+A (a) - p t (a)] = - Y^^{5[a-a(m(a(t))) + T £ ^[cr-cr^)]) + 0(A*) 



i ' t * 

By taking the limit A — ► we then arrive at the Fokker-Planck equation: 



r/f 



(11) 



Examples: Graded Response Neurons and Coupled Oscillators. In the case of graded response neurons 
the continuous variable Oi represents the membrane potential of neuron i, and (in their simplest form) 
the deterministic forces are given by fi(cr) = J2j Jij tanhpycrj] — o~i + 8i, with 7 > and with the 9i 
representing injected currents. Conventional notation is restored by putting U{ — > Uj. Thus equation 
(||) specialises to 

^Ui{t) = J ij tanh[7«,(i)] - m(t) + 6i + j/j(t) (12) 
j 

One often chooses T = (i.e. 77$ (i) = 0), the rationale being that threshold noise is already assumed 
to have been incorporated via the non-linearity in (|l2|). 

In our second example the variables <Xj represent the phases of coupled neural oscillators, with 
forces of the form fi(cr) = J2j Jij srn ( °~j — + Individual synapses Jij now try to enforce either 
pair-wise synchronisation (Jj,- > 0) or pair- wise anti-synchronisation (Jy < 0), and the uji represent 
the natural frequencies of the individual oscillators. Conventional notation dictates Oi — > (pi, giving 

j^itt) = + E J iJ sin[<^(t) -&(*)] + rji(t) (13) 
3 

2.2 Synaptic Symmetry & Lyapunov Functions 

Noise-free Symmetric Networks of Binary Neurons. In the deterministic limit T — » the rules ^ for 
networks of synchronously evolving binary neurons reduce to the deterministic map 



*i(£+l) = sgn \hi{a{i))] 



(14) 
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It turns out that for systems with symmetric interactions, Jij = Jji for all (ij), one can construct a 
Lyapunov function, i.e. a function of <x which during the dynamics decreases monotonically and is 
bounded from below (see e.g. ||): 

Binary & Parallel : L[cr] = - ^ \hi(cr) \ - ^ efii (15) 

i i 

Clearly L >— J2i[J2j l^jl + l^l] - Si \@i\- During iteration of (14) we find: 
L[<r(l + 1)]-L[<r(£)] = - £ \hi(*(£+l))\ + £ ^(M-lJE ~ £ ^ [<^+l)-^)] 

i i j i 

= -£ M(t(£+1))\ + £ ^(^(o-^+l)) = - M<r(i+1))\ [l-<ri(£+2)vi(£)] < 



(where we used (|14|) and Jij = Jji). So L decreases monotonically until a stage is reached where 
<7j(£ + 2) = (Ji{£) for all i. Thus, with symmetric interactions this system will in the deterministic limit 
always end up in a limit cycle with period < 2. A similar result is found for networks with binary 
neurons and sequential dynamics. In the limit T — > the rules (||) reduce to the map 

+ 1) = 5i, it sgn M*{1))\ + [1 - S iM }ai(e) (16) 

(in which we still have randomness in the choice of site to be updated). For systems with symmetric 
interactions and without self- interactions, i.e. Jn = for all i, we again find a Lyapunov function: 

Binary Sz Sequential : L[er] = — - o~iJijOj — ofti (17) 

ij i 

This quantity is bounded from below: L > — \ J2ij \Jij\ Upon calling the site ii selected for 

update at step i simply i, the change in L during iteration of (|i~6|) can be written as: 



L[a(£ + 1)] - L[a{£)\ = -0^+1)-^)] - ^£ J ik [ai(£+l)a k (£+l) - ai{£)a k {£)\ 

1 k 

j 

= [<Ji(£) - (7i(£+l)]E Jijotf) + Oi] = -M<t{£))\ [1 - ai{£)ai{£+l)] < 
j 

Here we used (|jl|), J« = Jji, and absence of self- interactions. Thus L decreases monotonically until 
<Tj(i + l) = Oi{t) for all i. With symmetric synapses, but without diagonal terms, the sequentially 
evolving binary neurons system will in the deterministic limit always end up in a stationary state. 

Noise-free Symmetric Networks of Continuous Neurons. One can derive similar results for models with 
continuous variables. Firstly, in the deterministic limit the graded response equations (|l^) simplify to 

jUi{t) = £ Jij t&nh[j Uj (t)] - Ui (t) + Oi (18) 
j 

Symmetric networks again admit a Lyapunov function (there is no need to eliminate self- interactions): 

Graded Response : L[u] = — — ^ Jjj tanh[7^] tanh[7n 7 -]+y^ 7/ dv v[l— tanh 2 [7f]] — 6i tanh^itj 

z . . Jo 
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Clearly L > — g J2ij \Jij\ ~ J2i (the term in L[u] with the integral is non-negative). During the 
noise-free dynamics ( |Ts|) one can use the identity dL/dui = — 7[1— t&nh 2 [yui]](dui / dt) , valid only when 
Jij = Jji, to derive 

Again L is found to decrease monotonically, until dui/dt = for all i, i.e. until we are at a fixed-point. 
Finally, the coupled oscillator equations (|l3|) reduce in the noise-free limit to 

j t 4>i{t) =LOi + Y, J n skfai (*)-&(*)] ( 19 ) 

Note that self- interactions J a always drop out automatically. For symmetric oscillator networks, a 
construction of the type followed for the graded response equations would lead us to propose 

Coupled Oscillators : L[4>] = — — ^ Jij cos[(/>j — <j>j] — ^ Ui4>i (20) 

ij i 

This function indeed decreases monotonically, due to dL/dfa = —dcpi/dt : 

d_ dLd_^ = ± 

dt Y d<f>i dt Y dt ~ 

In fact ([l9|) describes gradient descent on the surface L[cj)]. However, due to the term with the natural 
frequencies U{ the function L[4>] is not bounded, so it cannot be a Lyapunov function. This could 
have been expected; when = for all for instance, one finds continually increasing phases 

4>i(t) = <pi{Q)+0Jit. Removing the uj{, in contrast, gives the bound L > — J2j \ Jij\- Now the system 
must go to a fixed-point. In the special case Ui = lo (N identical natural frequencies) we can transform 
away the LOi by putting <f)(t) = <fri(t)+ut, and find the relative phases (pi to go to a fixed-point. 

2.3 Detailed Balance & Equilibrium Statistical Mechanics 

Detailed Balance for Binary Networks. The results obtained above indicate that networks with sym- 
metric synapses are a special class. We now show how synaptic symmetry is closely related to the 
detailed balance property, and derive a number of consequences. An ergodic Markov chain of the form 
(|||, i.e. 

Vi + M) = Y,W[a;a']p t {a') (21) 

is said to obey detailed balance if its (unique) stationary solution Poo(cr) has the property 

W [cr; a'] Poo (a')=W [a'; a] Poo (a) for all a. a 1 (22) 

All Poo(°") which satisfy ([22] ) are stationary solutions of (|2l]), this is easily verified by substitution. 
The converse is not true. Detailed balance states that, in addition to p a{< J ) being stationary, one has 
equilibrium: there is no net probability current between any two microscopic system states. 

It is not a trivial matter to investigate systematically for which choices of the threshold noise 
distribution w{rj) and the synaptic matrix {Jij} detailed balance holds. It can be shown that, apart 
from trivial cases (e.g. systems with self- interactions only) a Gaussian distribution w(rj) will not 
support detailed balance. Here we will work out details only for the choice w(rj) = — tanh 2 ^)], 



2 DEFINITIONS & PROPERTIES OF MICROSCOPIC LAWS 



10 



and for T > (where both discrete systems are ergodic). For parallel dynamics the transition matrix 
is given in (||), now with g[z] = tanh[z], and the detailed balance condition (|^) becomes 

for all a, a' (23) 



Il< cosh^/ii {a')] Ui cosh[/3hi (cr)] 

All Poo(cr) are non-zero (ergodicity) , so we may safely put Poo(c) = e^E* ^o-i+^cr)] cosh[/3/ii (cr)] , 
which, in combination with definition (|IJ) simplifies the detailed balance condition to: 

K(a) - K{*') = J2 °i [Jij ~ Jji] ^ all cr, cr' (24) 

u 

Averaging ( p4| ) over all possible cr' gives K(er) = (K^cr')}^/ for all cr, i.e. iT is a constant, whose value 
follows from normalising Poo(cr). So, if detailed balance holds the equilibrium distribution must be: 

Pcq (a) ~ e^^^iJJcosh^Co-)] (25) 



For symmetric systems detailed balance indeed holds: ( |25| ) solves (23), since K(cr) = K solves the 
reduced problem (p4|). For non-symmetric systems, however, there can be no equilibrium. For K{a) = 
K the condition ( |24| ) becomes YUj a i — Jji] a j = ® f° r an °"' <j/ ^ { — 1> 1}^- For N > 2 the vector 
pairs (<x, cr') span the space of all TV x N matrices, so Jij — Jji must be zero. For N = 1 there simply 
exists no non-symmetric synaptic matrix. In conclusion: for binary networks with parallel dynamics, 
interaction symmetry implies detailed balance, and vice versa. 

For sequential dynamics, with w(rj) = tanh 2 (??)], the transition matrix is given by @ and the 
detailed balance condition ([22]) simplifies to 



/i-i vi = m — vi — * or an a ari( i an 1 

cosh [phi[Fi(T)\ cosh [p/ij(cr)J 

Self-interactions J„, inducing hi(Fi<r) ^ hi(cr), complicate matters. Therefore we first consider sys- 
tems where all J a = 0. All stationary probabilities Poo(°") being non-zero (ergodicity), we may write: 

Poo (a) = /Ei^+§£i^^ CT i+*(°")] (26) 

Using relations like J2k^i JkiFi(a k cri) = J2k^i Jki<?k<?i-1<?iY.k^i [Jik + Jki]<?k we can simplify the de- 
tailed balance condition to K (i^cr) — K (cr) = (Ji2~2k^i[Jik — Jki]&k for all cr and all i. If to this 
expression we apply the general identity [1 — Fi\ /(cr) = 2<Ji(aif(cr)) (Ti we find for i ^ j: 

[Fj — l][Fi — l]K(cr) = —2<Ji<jj [Jij — Jji] for all cr and all i ^ j 

The left-hand side is symmetric under permutation of the pair (i, j), which implies that the interaction 
matrix must also be symmetric: Jij = Jji for all We now find the trivial solution K(cr) = K 

(constant), detailed balance holds and the corresponding equilibrium distribution is 

Peq(«r) ~ e-^ ff ) H(v) = ~Y t <r i J ij <r j -Y t e i a i (27) 

i^j i 

In conclusion: for binary networks with sequential dynamics, but without self- interactions, interaction 
symmetry implies detailed balance, and vice versa. In the case of self-interactions the situation is 
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more complicated. However, here one can still show that non-symmetric models with detailed balance 
must be pathological, since the requirements can be met only for very specific choices for the {Jij}. 

Detailed Balance for Networks with Continuous Neurons. Let us finally turn to the question of when we 
find microscopic equilibrium (stationarity without probability currents) in continuous models described 
by a Fokker-Planck equation (|TT1). Note that ([Tl] ) can be seen as a continuity equation for the density 
of a conserved quantity: ^gPt{&) + J2i gf^(°"> = 0- The components Ji(cr, t) of the current density 
are given by 

J i (a,t) = [f i (a)-T-^-]p t (a) 

Stationary distributions Poo(ff) are those which give J2i ^ _ ^i( <T ) 00 ) = (divergence-free currents). 
Detailed balance implies the stronger statement Jj(<r,oo) = for all i (zero currents), so fi(cr) = 
Tdlogpoo(cr)/do-i, or 

f l (a) = -dH(a)/da l , Poo (a) ~ e "^( ff ) (28) 

for some H(cr), i.e. the forces fi(cr) must be conservative. However, one can have conservative forces 
without a normalisable equilibrium distribution. Just take H(a) = 0, i.e. fi(a,t) = 0: here we have 
p eq (a) = C, which is not normalisable for <x £ 3l N . For this particular case equation (pd|) is solved 
easily: pt{cr) = [A7rTt]~ N ^ 2 J da' po(a')e~^ cr ~ cr ^ 2 / 4:Tt , so the limit limf_ >00 pf(<r) indeed does not exist. 
One can prove the following (see e.g. Q). If the forces are conservative and if poo(a) ~ e-^(^) is 
normalisable, then it is the unique stationary solution of the Fokker-Planck equation, to which the 
system converges for all initial distributions po G L 1 [5R Ar ] which obey f^da e^^pj^o") < oo. 

Assessing when our two particular model examples of graded response neurons or coupled oscillators 
obey detailed balance has thus been reduced mainly to checking whether the associated deterministic 
forces fi(cr) are conservative. Note that conservative forces must obey 

for all <x, for all i ^ j : dfi{a)/daj - dfj(a)/dai = (29) 

In the graded response equations (|lS|) the deterministic forces are fi(u) = J2j Jij tanh^iij] — Ui + 6i. 
Here dfi(u)/duj — dfj(u)/dui = j{ [1 — tanh 2 [7Mj] — Jj%[l — tanh 2 [7iij]}. At it = this reduces 
to Jij — Jji, i.e. the interaction matrix must be symmetric. For symmetric matrices we find away 
from u = 0: dfi(u)/duj — dfj(u)/dui = ^Jij{t&iih 2 [yui] — tanh 2 ['jUj]}. The only way for this to be 
zero for any u is by having Jij = for all i ^ j, i.e. all neurons are disconnected (in this trivial 
case the system jl~8| ) does indeed obey detailed balance). Network models of interacting graded- 
response neurons of the type (|l^) apparently never reach equilibrium, they will always violate detailed 
balance and exhibit microscopic probability currents. In the case of coupled oscillators (|l3|), where 
the deterministic forces are fi(<f>) = J2j Jij sin[(j)j — cpi] + u>i one finds the left-hand side of condition 
(P9|) to give dfi((f>)/d(j)j — dfj(di)/d<pi = [Jij — Jji] cos[<fij — fa]. Requiring this to be zero for any cf> 
gives the condition Ju = Jji for any i ^ j. We have already seen that symmetric oscillator networks 
indeed have conservative forces: fi(<f>) = —dH(<f>)/dfa, with H(<f>) = —^J2ij Jij cos [4 ) i~ ( t ) j]~J2i u! i ( l ) i- 
If in addition we choose all tUi = the function H(cr) will also be bounded from below, and, although 

Poo (0) ~ e-Wft is still not normalisable on £ 3? , the full 27r-periodicity of the function H(cr) 
now allows us to identify fa+2ir = fa for all i, so that now d) 6 [—71,^]^ and Jd(f> e~^ H<y ^ does exist. 
Thus symmetric coupled oscillator networks with zero natural frequencies obey detailed balance. In 
the case of non-zero natural frequencies, in contrast, detailed balance does not hold. 

Equilibrium Statistical Mechanics. The above results establish the link with equilibrium statistical 
mechanics (see e.g. [||, ^]). For binary systems with symmetric synapses (in the sequential case: 
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without self-interactions) and with threshold noise distributions of the form w(rj) = ^[1 — tanh 2 (ry)], 
detailed balance holds and we know the equilibrium distributions. For sequential dynamics it has the 
Boltzmann form ( |27| ) and we can apply standard equilibrium statistical mechanics. The parameter (5 
can formally be identified with the inverse 'temperature' in equilibrium, (3 = T _1 , and the function 
H(cr) is the usual Ising spin Hamiltonian. In particular we can define the partition function Z and 
the free energy F: 

p cq (<x) = |e-^W H(a) = -\Y. "iJiW ~ E ^ (30) 

Z = Y J <T m<T) F = -p- 1 logZ (31) 

rr 

The free energy can be used as the generating function for equilibrium averages. Taking derivatives 
with respect to external fields Q{ and interactions Jij, for instance, produces fa) = —dF/d9i and 
faffj) =—dF/dJij, whereas equilibrium averages of arbitrary state variable /(<x) can be obtained by 
adding suitable generating terms to the Hamiltonian: H(cr) — > H (<r) + Xf (<x) , (/) = lim^^o 9F/8X. 



In the parallel case (25) we can again formally write the equilibrium probability distribution in 



the Boltzmann form M and define a corresponding partition function Z and a free energy F: 

Pcq (cr) = leT^) H(a) = -^0 i a i -^Y f ]og2coBii\J3h i (tr)] (32) 

i % 

Z = Y J <T m{CT) F = -(i- 1 \ogZ (33) 

(T 

which again serve to generate averages: H{a) — > H{a) + A/(cr), (/) = limA-^o dF/dX. However, 
standard thermodynamic relations involving derivation with respect to (5 need no longer be valid, and 
derivation with respect to fields or interactions generates different types of averages, such as 

-dF/dBi = {a,) + (tanh[/^(o-)]) - 8F/dJ ti = fa tanh^er)]) 

i ^ j : - dF/dJij = {ai taah\phj(<r)]) + {crj tanh[(3 hi (a)]) 

One can use (crj) = (tanh[/3/ij(cr)]), which can be derived directly from the equilibrium equation 
Peqfa) = J2<t' W[cr; cr']p eq (cr), to simplify the first of these identities. 

A connected network of graded-response neurons can never be in an equilibrium state, so our only 
model example with continuous neuronal variables for which we can set up the equilibrium statistical 
mechanics formalism is the system of coupled oscillators (13) with symmetric synapses and absent (or 



uniform) natural frequencies U{. If we define the phases as 4>i £ [ _ " 7r ) 7r ] we have again an equilibrium 
distribution of the Boltzmann form, and we can define the standard thermodynamic quantities: 

Peq (0) = I e -^(0) H(4>) = ~\ E J u "xfa-W (34) 

ij 

r d<f> e-Wb F = -p- 1 log Z (35) 

These generate equilibrium averages in the usual manner. For instance (cos[<^j — 0,-]} = —dF/dJy, 
whereas averages of arbitrary state variables f(<f>) follow, as before, upon introducing suitable gener- 
ating terms: H(<f>) -> H(tf>) + Xf (tf>) , (/) = Hm x ^ dF/dX. 

In this chapter we restrict ourselves to symmetric networks which obey detailed balance, so that 
we know the equilibrium probability distribution and equilibrium statistical mechanics applies. In the 
case of sequential dynamics we will accordingly not allow for the presence of self- inter actions. 
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3 Simple Recurrent Networks with Binary Neurons 
3.1 Networks with Uniform Synapses 

We now turn to a simple toy model to show how equilibrium statistical mechanics is used for solving 
neural network models, and to illustrate similarities and differences between the different dynamics 
types. We choose uniform infinite-range synapses and zero external fields, and calculate the free 
energy for the binary systems (JUg), parallel and sequential, and with threshold noise distribution 
w (rj) = |[l-tanh 2 (r/)]: 

Jij = Jji = J/N (i / j), Ju = 9i = for all i 

The free energy is an extensive object, lirri7v-+oo F/N is finite. For the models (|l|^) we now obtain: 
Binary & Sequential : lim F/N = - lim {(3N)- 1 log^ e^l- 7 ™^ ")] 



CT 



Binary k Parallel : lim F/N = - lim (fiN)' 1 log V e ^P°g2co S h[/3Jm(<7)]] 

with the average activity m(er) = i Y] k cr fe . We have to count the number of states er with a prescribed 
average activity m = 2n/N — 1 (n is the number of neurons i with <7j = 1), in expressions of the form 

llog^e^Ko-)] = llogf; f ^] e^^-i] = ^log I' dm e Mio g 2-c*(m)+£/M] 

<T n=0 V / — 1 

lim ^-logV e W[m(cr)1 =log2+ max {[/[m] - c*(m)} 
AT— »oo N m6[— 1,1] 

with the entropic function c*(m) = i(l+m) log(l+m) + ^(l — m) log(l — m). In order to get there we 
used Stirling's formula to obtain the leading term of the factorials (only terms which are exponential 
in N survive the limit N — * oo), we converted (for N — ► oo) the summation over n into an integration 
over m = 2n/N — 1 G [—1, 1], and we carried out the integral over m via saddle-point integration (see 
e.g. H). This leads to a saddle-point problem whose solution gives the free energies: 

lim F/N = min / seq (m) /3/ seq (m) = c* (m) - log 2 - \f3 Jm 2 (36) 

iV^oo mG[— 1,1] Z 

lim F/N= min / par ( m -) /3/ par (m) = c*(m) — 2 log 2 — log cosh [/3Jm] (37) 

A^oo me[-l,l] 

The functions to be minimised are shown in figure 0. The equations from which to solve the minima 
are easily obtained by differentiation, using ^c*(m) = tanh _1 (m). For sequential dynamics we find 

Binary & Sequential : m = tanh[/3 Jm] (38) 

(the so-called Curie- Weiss law). For parallel dynamics we find 

m = tanh [(3 J tanh[/3 Jm]] 

One finds that the solutions of the latter equation again obey a Curie- Weiss law. The definition 
m = tanh[/3|J|m] transforms it into the coupled equations m = tanh[/3|J|m] and m = tanh[/3| J|m], 



3 SIMPLE RECURRENT NETWORKS WITH BINARY NEURONS 



14 



from which we derive < [m — m] 2 = [m — m] [tanh[/3| «/|m] — tanh[/3| J\m]] < 0. Since tanh[/?| J\m] is a 
monotonically increasing function of m, this implies m = m, so 

Binary &; Parallel : m = tanh[/?| J\m] (39) 

Our study of the toy models has thus been reduced to analysing the non-linear equations (|3^) and 
(|39[). If J > (excitation) the two types of dynamics lead to the same behaviour. At high noise levels, 
T > J, both minimisation problems are solved by m = (see figure ^), describing a disorganised 
(paramagnetic) state. This can be seen upon writing the right-hand side of (|38|) in integral form: 



m 2 



m tanh[/3 Jm] = j3Jm 2 I dz [1— tanh 2 [/3Jm2:]] < f3Jm 2 
Jo 



So m 2 [l — (3 J] < 0, which gives m = as soon as f3J < 1. A phase transition occurs at T = J (a 
bifurcation of non-trivial solutions of (|3~8|)), and for T < J the equations for m are solved by the two 



non-zero solutions of (35), describing a state where either all neurons tend to be firing (m > 0) or 
where they tend to be quiet {m < 0). This becomes clear when we expand ( |38| ) for small m: m = 
(9Jm+C(m 3 ), so precisely at (3 J = 1 one finds a de-stabilisation of the trivial solution m = 0, together 
with the creation of (two) stable non-trivial ones (see also figure |2|). Furthermore, using the identity 
c*(tanhx) = xtanhx— log cosh x, we obtain from ( j36i[37| ) the relation lmiAr^oo F/N = 21imjv^ 00 F/N. 
For J < (inhibition), however, the two types of dynamics give quite different results. For sequential 
dynamics the relevant minimum is located at m = (the paramagnetic state). For parallel dynamics, 
the minimisation problem is invariant under J — > —J, so the behaviour is again of the Curie- Weiss type 
(see figure ||| and equation (|^)), with a paramagnetic state for T > | J|, a phase transition at T = \ J\, 
and order for T < \J\. This difference between the two types of dynamics for J < is explained by 
studying dynamics. As we will see in a subsequent chapter, for the present (toy) model in the limit 
N — > oo the average activity evolves in time according to the deterministic laws 

-^m = tanh[/?Jm] — m m{t+l) = tanh[/3 Jm{t)\ 

for sequential and parallel dynamics, respectively. For J < the sequential system always decays 
towards the trivial state m = 0, whereas for sufficiently large (3 the parallel system enters the stable 
limit-cycle m(t) = Mg(— 1)' (where Mp is the non-zero solution of (j3S|)). The concepts of 'distance' 
and 'local minima' are quite different for the two dynamics types; in contrast to the sequential case, 
parallel dynamics allows the system to make the transition m — > — m in equilibrium. 

3.2 Phenomenology of Hopfield Models 

The Ideas Behind the Hopfield Model. The Hopfield model || is a network of binary neurons of the 
type with threshold noise w(r]) = — tanh 2 (r?)], and with a specific recipe for the synapses 



Jij aimed at storing patterns, motivated by suggestions made in the late nineteen- forties [10]. The 
original model was in fact defined more narrowly, as the zero noise limit of the system (|2|) , but the term 
has since then been accepted to cover a larger network class. Let us first consider the simplest case 
and try to store a single pattern £ E {— 1, 1}^ in noise-less infinite-range binary networks. Appealing 
candidates for interactions and thresholds would be Jij = and 8i = (for sequential dynamics we 
put Ju = for all i). With this choice the Lyapunov function ( |l7| ) becomes: 

l sc ^] = \n-\[Y^^ 1 ] 2 
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It will have to decrease monotonically during the dynamics, from which we immediately deduce 

5^(0) >0: er(oo)=£, ]T^(0)<0: <r(oo) = -£ 

i i 

This system indeed reconstructs dynamically the original pattern £ from an input vector <x(0), at 
least for sequential dynamics. However, en passant we have created an additional attractor: the state 
— £. This property is shared by all binary models in which the external fields are zero, where the 
Hamiltonians H(a) (^) and H(cr) (|32| ) are invariant under an overall sign change cr — ► —rr. A second 
feature common to several (but not all) attractor neural networks is that each initial state will lead 
to pattern reconstruction, even nonsensical (random) ones. 

The Hopfield model is obtained by generalising the previous simple one-pattern recipe to the case 
of an arbitrary number p of binary patterns ^ = (£f , . . . , G { — 1, 1}^: 

1 p 

Jij = — ^2 Ci^ji 0i = for all i (sequential dynamics : J a — > for all i) (40) 

,u=l 

The prefactor TV -1 has been inserted to ensure that the limit N —>■ oo will exist in future expressions. 
The process of interest is that where, triggered by correlation between the initial state and a stored 
pattern £ A , the state vector a evolves towards £ A . If this happens, pattern £ A is said to be recalled. 
The similarity between a state vector and the stored patterns is measured by so-called overlaps 

= ^Etf** ( 41 ) 

i 

Numerical simulations illustrate the functioning of the Hopfield model as an associative memory, and 
the description of the recall process in terms of overlaps. Our simulated system is an N = 841 
Hopfield model, in which p = 10 patterns have been stored (see figure ||) according to prescription 



(40). The two-dimensional arrangement of the neurons in this example is just a guide to the eye; since 
the network is fully connected the physical location of the neurons is irrelevant. The dynamics is as 
given by (||), with T = 0.1. In figure || we first show (left column) the result of letting the system 
evolve in time from an initial state, which is a noisy version of one of the stored patterns (here 40% 
of the neuronal states Oi where corrupted, according to &{ — >— <ji). The top left row of graphs shows 
snapshots of the microscopic state as the system evolves in time. The bottom left row shows the values 
of the p = 10 overlaps m«, as defined in (^l[), as functions of time; the one which evolves towards 
the value 1 corresponds to the pattern being reconstructed. The right column of figure |] shows a 
similar experiment, but here the initial state is drawn at random. The system subsequently evolves 
towards a mixture of the stored patterns, which is found to be very stable, due to the fact that the 
patterns involved (see figure ||) are significantly correlated. It will be clear that, although the idea of 
information storage via the creation of attractors does work, the choice ( f40| ) for the synapses is still 
too simple to be optimal; in addition to the desired states ^ and their mirror images — even more 
unwanted spurious attractors are created. Yet this model will already push the analysis to the limits, 
as soon as we allow for the storage of an extensive number of patterns 

Issues Related to Saturation: Storage Capacity & Non-Trivial Dynamics. In our previous simulation 
example the loading of the network was modest; a total of ^N(N-l) = 353,220 synapses were used 
to store just pN = 8,410 bits of information. Let us now investigate the behaviour of the network 
when the number of patterns scales with the system size as p = aN (a > 0); now for large iV the 
number of bits stored per synapse will be pN/^N(N — 1) ~ 2a. This is called the saturation regime. 
Again numerical simulations, but now with finite a, illustrate the main features and complications 
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of recall dynamics in the saturation regime. In our example the dynamics are given by (|l|) (parallel 
updates), with T = 0.1 and threshold noise distribution w(rj) = ^[1— tanh 2 (77)]; the patterns are chosen 
randomly. Figure |5] shows the result of measuring in such simulations the two quantities 

m = mi(ff) r = a" 1 ^ m^(tr) (42) 



following initial states which are correlated with pattern only. For large N we can distinguish 
structural overlaps, where m M (cr) = 0(1), from accidental ones, where m»((r) = 0(N~2) (as for a 
randomly drawn er). Overlaps with non-nominated patterns are seen to remain 0(N~2), i.e. r(t) = 
0(1). We observe competition between pattern recall (m — > 1) and interference of non-nominated 
patterns (m — > 0, with r increasing), and a profound slowing down of the process for non-recall 
trajectories. The initial overlap (the 'cue') needed to trigger recall is found to increase with increasing a 
(the loading) and increasing T (the noise). Further numerical experimentation, with random patterns, 
reveals that at any noise level T there is a critical storage level a c (T) above which recall is impossible, 
with an absolute upper limit of a c = max^ a c (T) = a c (0) ~ 0.139. The competing forces at work are 
easily recognised when working out the local fields (jl|), using (|4C|): 

hi{a) = £mi(«r) + £*fa + O^ 1 ) (43) 

The first term in ( fi"3"| ) drives a towards pattern as soon as mi(<r) > 0. The second terms represent 
interference, caused by correlations between <x and non-nominated patterns. One easily shows (to be 
demonstrated later) that for N — > 00 the fluctuations in the values of the recall overlap m will vanish, 
and that for the present types of initial states and threshold noise the overlap m will obey 

m(t+l) = JdzP t {z)t^[f3{m(t)+z)} P t {z) = Yirn^ 1 - 1 ^ E (44) 

If all (Jj(O) are drawn independently, Prob[<r^(0) = rb^] = i[l ±m(0)], the central limit theorem states 
that Pq{z) is Gaussian. One easily derives (z) = and (z 2 )o = a, so at t = equation (E4]) gives 

m(l) = /"-i e"5 22 tanh[/3(m(0) + zya)] (45) 

J V 27T 



The above ideas, and equation (Ha) in particular, go back to ||ITJ. For times t > 0, however, the 
independence of the states Oi need no longer hold. As a simple approximation one could just assume 
that the Oi remain uncorrelated at all times, i.e. Prob[cjj(i) = ±£*] = \ \V ± m(t)\ for all t > 0, such 
that the argument given for i = would hold generally, and where (for randomly drawn patterns) the 
mapping (]45l) would describe the overlap evolution at all times: 

m(t + 1) = /~7^= e"^ 2 tanh[/3(m(t) + z\/a)] (46) 

J \/2lT 



This equation, however, must be generally incorrect. Firstly, figure |5| already shows that knowledge 
of m(t) only does not yet permit prediction of m(t + l). Secondly, upon working out its bifurcation 
properties one finds that equation (|46|) predicts a storage capacity of a c = 2 /it m 0.637, which is 
no way near to what is actually being observed. We will see in the paper on dynamics that only 
for certain types of extremely diluted networks (where most of the synapses are cut) equation (|46| ) 
is indeed correct on finite times; in these networks the time it takes for correlations between neuron 
states to build up diverges with N, so that correlations are simply not yet noticable on finite times. 
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For fully connected Hopfield networks storing random patterns near saturation, i.e. with a > 
0, the complicated correlations building up between the microscopic variables in the course of the 
dynamics generate an interference noise distribution which is intrinsically non-Gaussian, see e.g. figure 
||. This leads to a highly non-trivial dynamics which is fundamentally different from that in the 
limjsr^oop/N = regime. Solving models of recurrent neural networks in the saturation regime 
boils down to calculating this non-Gaussian noise distribution, which requires advanced mathematical 
techniques (in statics and dynamics), and constitutes the main challenge to the theorist. The simplest 
way to evade this challenge is to study situations where the interference noise is either trivial (as 
with asymmetric extremely diluted models) or where it vanishes, which happens in fully connected 
networks when a = lim.N-,00 p/N = (as with finite p). The latter a = regime is the one we will 
explore first. 



3.3 Analysis of Hopfield Models Away Prom Saturation 

Equilibrium Order Parameter Equations. A binary Hopfield network with parameters given by fl40f ) 
obeys detailed balance, and the Hamiltonian H(a) (|30|) (corresponding to sequential dynamics) and 
the pseudo-Hamiltonian H(cr) ( |32| ) (corresponding to parallel dynamics) become 



H(a) = -^V£ mfca) + ±p H(a) = -- £log 2 cosh[/3 £ (47) 

with the overlaps (f4l|). Solving the statics implies calculating the free energies F and F: 
F = _ L g V e -m*) p = - \ log V e-^) 

Upon introducing the short-hand notation m = (mi, . . . , m p ) and £ t = (£* , . . . , £?), both free energies 
can be expressed in terms of the density of states V{m) = 2~ N Yl a 5[m — m(cr)]: 

F/N = -1 log2 - log j dm V(m) e^ Nm? + ^ (48) 

F/N = log 2 - log J dm V(m) e^ti log 2 cosh [^-m] ( 49 ) 

(note: Jdm 5[m — m(cr)] = 1). In order to proceed we need to specify how the number of patterns 
p scales with the system size N. In this section we will follow |T^] (equilibrium analysis following 
sequential dynamics) and (l^] (equilibrium analysis following parallel dynamics), and assume p to be 
finite. One can now easily calculate the leading contribution to the density of states, using the integral 
representation of the 5- function and keeping in mind that according to (48,4^) only terms exponential 
in N will retain statistical relevance for N — > 00: 

lim — logX>(m)= lim —log fdx e iNx - m (e'^ti ^ x ) a 

JV— >oo N V ' N^ooN J X ' 

1 f N{iX-m+{logcosi£-X})c] 

= lim — log dx e s 

jV^oc N J 

with the abbreviation («£(£))£ = liniTv^oo Y^iLi The leading contribution to both free en- 

ergies can be expressed as a finite-dimensional integral, for large N dominated by that saddle-point 
(extremum) for which the extensive exponent is real and maximal: 

lim F/N = -— log / dmdx e ~NPf(m,x) = extTxm f( m ,x) 

N^oo jjN J 
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lim FIN = -— log / dmdx e ~ N Ph m > x ) = extr^ m f(m, x) 

N^oo fjN J 

with 

f(m,x) = — \m 2 — ix ■ m — (log 2 cos • 

/(m, x) = — /3 _1 (log2 cosh • m])^ — ix ■ m — /3 _1 (log 2 cos ■ 

The saddle-point equations for / and / are given by: 

/: x = im, im = (£ tan [j3£ • x] ) ^ 

/: s = i(£tanh • m])^, im = (£tan ■ sc])> 

In saddle-points a; turns out to be purely imaginary. However, after a shift of the integration contours, 
putting x = ix*{m) + y (where ix*(m) is the imaginary saddle-point, and where y € $t p ) we can 
eliminate x in favor of y E which does have a real saddle-point, by construction.^ We then obtain^ 

Sequential Dynamics : m = (£tanh[/3£ • m])g 

Parallel Dynamics : m = (£tanh[/3£ • [(£' tanh[/3£' • 

(compare to e.g. (38,39)). The solutions of the above two equations will in general be identical. To 
see this, let us denote rh = (£tanh • m])g, with which the saddle point equation for / decouples 
into: 

m = (£ tanh [/3£ • rh] )t rh = (£ tanh • m] ) ^ 

so 

[m — rh] 2 = ([(£ • m) — (£ • rh)] [tanh(/3£ • m)— tanh(/3£ • m)])^ 

Since icm/i is a monotonicaly increasing function, we must have [m — rh] • £ = for each £ that 
contributes to the averages (•■•}£■ F° r aii choices of patterns where the covariance matrix C^ u = 
(£,n^u)^ is positive definite, we thus obtain m = rh. The final result is: for both types of dynamics 
(sequential and parallel) the overlap order parameters in equilibrium are given by the solution m* of 

m = (£tanh[/3£-ra])t (50) 

which minimises^ 

/(m) = X -m 2 - 1 (log 2 cosh [/?£ • m])| (51) 

The free energies of the ergodic compoments are limjv— >oo F/N = f(m*) and limjy^oo F/N = 2/(m*). 
Adding generating terms of the form H — * H + \g[m(cr)] to the Hamiltonians allows us identify 
(g[m(cr)]) cq = lim\^odF/dA = g[m*]. Thus, in equilibrium the fluctuations in the overlap order 
parameters m(<x) (Elf) vanish for ./V — > oo. Their deterministic values are simply given by m* . Note 



1 Our functions to be integrated have no poles, but strictly speaking we still have to verify that the integration segments 
linking the original integration regime to the shifted one will not contribute to the integrals. This is generally a tedious 
and distracting task, which is often skipped. For simple models, however (e.g. networks with uniform synapses), the 
verification can be carried out properly, and all is found to be safe. 

2 Here we used the equation df(m,x)/dm = to express x in terms of m, because this is simpler. Strictly speaking 
we should have used df(m, x)/dx = for this purpose; our short-cut could in principle generate additional solutions. In 
the present model, however, we can check explicitly that this is not the case. Also, in view of the imaginary saddle-point 
x, we can not be certain that, upon elimination of x, the relevant saddle-point of the remaining function f(m) must be 
a minimum. This will have to be checked, for instance by inspection of the T — > oo limit. 

3 We here indeed know the relevant saddle-point to be a minimum: the only solution of the saddle-point equations at 
high temperatures, m = 0, is seen to minimise f(m), since /(m)+/3 _1 log2 = |m 2 (l- /3) + C9(/3 3 ). 
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that in the case of sequential dynamics we could also have used linearisation with Gaussian integrals 
(as used previously for coupled oscillators with uniform synapses) to arrive at this solution, with p 
auxiliary integrations, but that for parallel dynamics this would not have been possible. 

Analysis of Order Parameter Equations: Pure States & Mixture States. We will restrict our further 
discussion to the case of randomly drawn patterns, so 

£e{-i,ip 

(generalisation to correlated patterns is in principle straightforward). We first establish an upper 
bound for the temperature for where non-trivial solutions m* could exist, by writing ( |50[ ) in integral 
form: 

m ll = P(Z li (t-m) J (iA[l-tanh 2 [/3A£-m]])£ 
from which we deduce 1 

= m 2 -/3((£ ■ m) 2 ^ d\[l-tanh 2 [j3\£ • m]]>£ > m 2 -0{(£ ■ m) 2 )^ = m 2 (l-(3) 



For T > 1 the only solution of (|50|) is the paramagnetic state m = 0, which gives for the free energy 
per neuron — Tlog2 and — 2Tlog2 (for sequential and parallel dynamics, respectively). At T = 1 a 
phase transition occurs, which follows from expanding (|50|) for small \m\ in powers of r = j3— 1: 



1 2 

m M = (l+r)m M - - ^ m u m p m x (^^x)^ + C(m 5 ,rm 3 ) = m At [l + r-m 2 + -m 2 ] + 0(m 5 ,rr™ 3 ) 

The new saddle-point scales as m M = m^r 1 / 2 -!- 0(r 3 / 2 ), with for each fi: m fl = or = 1— m 2 + §rn. 2 . 
The solutions are of the form rh^ S {— rh, 0,rh}. If we denote with n the number of non-zero compo- 
nents in the vector rh, we derive from the above identities: = or rh^ = ±V3/y/3n~^2. These 
saddle-points are called mixture states, since they correspond to microscopic configurations correlated 
equally with a finite number n of the stored patterns (or their negatives). Without loss of generality we 
can always perform gauge transformations on the set of stored patterns (permutations and reflections), 
such that the mixture states acquire the form 

n times p—n times 

m = m n (T^~l, OT^TO) m n = [-^—] 1 2((3-l) 1/2 + ... (52) 

in — 1 

These states are in fact saddle-points of the surface f(m) (|5l|) for any finite temperature, as can be 
verified by substituting ( |52[ ) as an ansatz into 



fi<n: m n = tanh[/3m n ^ fi > n : = tanh[/3m n &,])| 



The second equation is automatically satisfied since the average factorises. The first equation leads to 
a condition determining the amplitude m n of the mixture states: 

m n = ([- £d tanh[/3m n ^ (53) 
The corresponding values of f(m), to be denoted by f n , are 

/n = 2 rtm n _ 4( lo § 2 cosh[/3m n ^ (54) 
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The relevant question at this stage is whether or not these saddle-points correspond to local minima 
of the surface f(m) (|5lD. The second derivative of f(m) is given by 

>h J ' m ' " V - 0{^ v \l - tanh 2 \p£ • m]l )t (55) 



dm p dm 



(a local minimum corresponds to a positive definite second derivative). In the trivial saddle-point 
m = this gives simply 8 pu {l—j3), so at T = 1 this state destabilises. In a mixture state of the type 
( |52[ ) the second derivative becomes: 

D$ = <W - /3<£ M 6,[l-tanh 2 [/?m n £ ^]])^ 

Due to the symmetries in the problem the spectrum of the matrix can be calculated. One finds 
the following eigenspaces, with Q = (tanh 2 [/3m n Y, p < n C P ])p and R = (£i£ 2 tanh 2 [/5m n Y, p < n £p])£ : 

Eigenspace : Eigenvalue : 

J: a; = (0, . . . ,0,x n+ i, . . . ,x p ) 1-/?[1-Q] 

II: x = (1,...,1,0,...,0) l-/3[l-Q + (l-n)i?] 

///: x = (x 1 ,...,x n ,0,...,0), E M x M = l-/3[l-Q+i?] 

Eigenspace 1/7 and the quantity R only come into play for n > 1. To find the smallest eigenvalue we 
need to know the sign of R. With the abbreviation = Y^ p<n £ p we find: 

n(n-l)R = {Ml tanh 2 [(3m n M^\)^ - n(tanh 2 [/3m n M^])^ 
= ([M^-(M 2 ^)^} tanh 2 [f3m n \Mg\])g 



= ([M 2 -(M 2 ,)^] |tanh 2 [/3m nv /M 2 ] - tanh 2 [/?m n ^(M 2 ,)^]|)^ > 

We may now identify the conditions for an n- mixture state to be a local minimum of f(m). For 
n = 1 the relevant eigenvalue is /, now the quantity Q simplifies considerably. For n > 1 the relevant 
eigenvalue is III, here we can combine Q and R into one single average: 

n = 1 : l-/3[l-tanh 2 [/3mi]] > 



n = 2 
n > 3 



1-13 > 

l-/3[l-(tanh 2 [/?m n Ep= 3 ep])^ > 



The n = 1 states, correlated with one pattern only, are the desired solutions. They are stable for all 
T < 1, since partial differentiation with respect to /3 of the n = 1 amplitude equation (|53| ) gives 

mi = tanh[/?mi] — ► 1 — tanh 2 [/?mi]] = mj[l— tanh 2 [/3mi]](9mi/9/3) _1 

(clearly sgn[mi] = sgn[<9mi/d/3]). The n = 2 mixtures are always unstable. For n > 3 we have 
to solve the amplitude equations ( |53| ) numerically to evaluate their stability. The result is shown in 
figure 0, together with the corresponding 'free energies' f n (|54|). It turns out that only for odd n will 
there be a critical temperature below which the n-mixture states are local minima of f(m). From 
figure we can also conclude that, in terms of the network functioning as an associative memory, 
noise is actually beneficial in the sense that it can be used to eliminate the unwanted n > 1 ergodic 
components (while retaining the relevant ones: the pure n = 1 states). In fact the overlap equations 
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( pO[) do also allow for stable solutions different from the n-mixture states discussed here. They are in 
turn found to be continuously bifurcating mixtures of the mixture states. However, for random (or 
uncorrelated) patterns they come into existence only near T = and play a marginal role; phase space 
is dominated by the odd n-mixture states. 

We have now solved the model in equilibrium for finite p and N — > oo. Most of the relevant 
information on when and to what extent stored random patterns will be recalled is summarised in 
figure [?]. For non-random patterns one simply has to study the bifurcation properties of equation ( |50| ) 
for the new pattern statistics at hand; this is only qualitatively different from the random pattern 
analysis explained above. The occurrence of multiple saddle-points corresponding to local minima 
of the free energy signals ergodicity breaking. Although among these only the global minimum will 
correspond to the thermodynamic equilibrium state, the non-global minima correspond to true ergodic 
components, i.e. on finite time-scales they will be just as relevant as the global minimum. 



4 Simple Recurrent Networks of Coupled Oscillators 
4.1 Coupled Oscillators with Uniform Synapses 

Models with continuous variables involve integration over states, rather than summation. For a coupled 
oscillator network (|l~3| ) with uniform synapses Jij = J/N and zero frequencies u>i = (which is a simple 
version of the model in |l4|]) we obtain for the free energy per oscillator: 

lim F/N=- lim J- log f • • • f# e 0W/^[C*«- <*)I a +E,"K*)l a ] 

We would now have to 'count' microscopic states with prescribed average cosines and sines. A faster 
route exploits auxiliary Gaussian integrals, via the identity 

eh 2 = Jdz e yz (56) 

1 12 

with the short-hand Dx = (2-7r) _ 2 e~ 2^ dx (this alternative would also have been open to us in the 
binary case; my aim in this section is to explain both methods): 

lim F/N = - lim -J- log [ % ■ ■ ■ T d<f> [ DxDy e sin^)] 



-.- lim 4- log / DxDy \ H dcf> e ™WVW^W)/N 
lim J_lo g f°°dq qe-^ N ^ 2 [ f# e^^^^ 



N 



where we have transformed to polar coordinates, (x,y) = qyp\ J\ N(cos 9, sin 6), and where we have 
already eliminated (constant) terms which will not survive the limit N — * 00. Thus, saddle-point 
integration gives us, quite similar to the previous cases (36|.37|): 

J>0: (3f(q)= 1 1 (3\J\q 2 -log[27Tl (P\J\q)} 
hm F UN = mm t{q) 57 
N ^°° ^° J<0: Pf(q) = ^\J\q 2 -log[27rI (i(]\J\q)] 

in which the I n (z) are the Bessel functions (see e.g. ]15[ ). The function /(g) is shown in figure ||. The 

d 

dz J 



equations from which to solve the minima are obtained by differentiation, using 4-Iq(z) = I\(z): 



T ^ n hW\J\q) T ^ n . h{i[3\J\q) (Ka , 

J>0: 1= T T»\ 71 ^ J < : q = 1 (58 

Io(P\J\q) I (iP\J\q) 
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Again, in both cases the problem has been reduced to studying a single non-linear equation, 
physical meaning of the solution follows from the identity —2dF/dJ = (N^ 1 Y^i^j cos(0j— (j>j))\ 



The 



lim ([— 



^cosOi)] 2 



+ lim ([— V 



sm(c/>;)] 2 ) = sgn(J) q 2 



From this equation it also follows that q < 1. Note: since df{q)/dq = at the minimum, one 
only needs to consider the explicit derivative of f{q) with respect to J. If the synapses induce anti- 

0, the only solution of (|5£ 



synchronisation, J < 0, the only solution of ( |58| ) (and the minimum in (57)) is the trivial state 
q = 0. This also follows immediately from the equation which gave the physical meaning of q. For 
synchronising forces, J > 0, on the other hand, we again find the trivial solution at high noise levels, 
but a globally synchronised state with q > at low noise levels. Here a phase transition occurs at 
T = ^ J (a bifurcation of non-trivial solutions of (^)), and for T < ^ J the minimum of ( |57"|) is found 
at two non-zero values for q. The critical noise level is again found upon expanding the saddle-point 
equation, using I (z) = l + 0{z 2 ) and h(z) = \z + 0(z z ): q = \[3Jq + 0(q 3 ). Precisely at j3J = 2 
one finds a de-stabilisation of the trivial solution q = 0, together with the creation of (two) stable 
non-trivial ones (see figure §). Note that, in view of (]57|), we are only interested in non- negative 
values of q. One can prove, using the properties of the Bessel functions, that there are no other 
(discontinuous) bifurcations of non-trivial solutions of the saddle-point equation. Note, finally, that 
the absence of a state with global anti-synchronisation for J < has the same origin as the absence 
of an anti- ferromagnetic state for J < in the previous models with binary neurons. Due to the 
long-range nature of the synapses Jij = J/N such states simply cannot exist: whereas any set of 
oscillators can be in a fully synchronised state, if two oscillators are in anti-synchrony it is already 
impossible for a third to be simultaneously in anti-synchrony with the first two (since anti-synchrony 
with one implies synchrony with the other). 



4.2 Coupled Oscillator Attractor Networks 

Intuition & Definitions. Let us now turn to an alternative realisation of information storage in a 
recurrent network based upon the creation of attractors. We will solve models of coupled neural 
oscillators of the type (|l3|), with zero natural frequencies (since we wish to use equilibrium techniques), 
in which real- valued patterns are stored as stable configurations of oscillator phases, following [fig]. 
Let us, however, first find out how to store a single pattern £ G [— 7r,7r] in a noise-less infinite- 
range oscillator network. For simplicity we will draw each component £j independently at random 
from [— 7r,7r], with uniform probability density. This allows us to use asymptotic properties such as 



0(N 2 ) for any integer £. A sensible choice for the synapses would be Jij = cos[£j— £ 



To see this we work out the corresponding Lyapunov function ( 



L[4>] 



1 



2N 2 



cos [& - £j] cos [fa -4>j] 



2N 2 



Yl cos2 



\ + 0{N~ 1 r 



(the factors of iV have been inserted to achieve appropriate scaling in the N — > oo limit). The function 
L[cj>], which is obviously bounded from below, must decrease monotonically during the dynamics. To 
find out whether the state £ is a stable fixed-point of the dynamics we have to calculate L and 
derivatives of L at 4> = 



dL 

d<t>i 



1 

2iV2 



£sin[2(£Ki)] 



d 2 L 



1 

N2 



^cos 2 [£Kj] i^j : 



d 2 L 



1 



cos 2 [&-£/] 
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Clearly limjv^oo L[£] = - \. Putting d> = £ + A0, with A<fo = O(N ), we find 

/9L 1 (9 2 / 
^K+A0] - Lfc] = £ Afc — |^ + - £ A^,-— + 0(A0 3 ) 



ij 



= 4^ E A <^ - ^2 E A ^ A< ^ cos2 fe + A0 3 ) 

i ij 

= I {^E A ^ " t^E A ^] 2 " t^E A ^cos(2^)] 2 - [l£ A«^sin(2&)] 2 } + 0(J\T*, A0 3 ) 

(59) 

In leading order in N the following three vectors in $l N are normalised and orthogonal: 
1 

ei = -7=(1, !,•••,!), e 2 = -=(cos(2fi), . . . , cos(2£tv)), e 2 = -=(sin(2£i), . . . , sin(2f jv)) 
V iv v TV V i V 



We may therefore use A(f> > (A0-ei) +(A0-e2) + (A0-e3) , insertion of which into (^) leads to 
L[S+A4>]-L[S] > [^^A0 i cos(2^)] 2 + [^E A ^ sin ( 2 ^)] 2 + O ( iV "^ A ^ 3 ) 

i i 

Thus for large N the second derivative of L is non- negative at = £, and the phase pattern £ has 
indeed become a fixed-point attractor of the dynamics of the noise-free coupled oscillator network. 
The same is found to be true for the states cf> = ±£+a(l, . . . , 1) (for any a). 

Storing p Phase Patterns: Equilibrium Order Parameter Equations. We next follow the strategy of 
the Hopfield model and attempt to simply extend the above recipe for the synapses to the case of 
having a finite number p of phase patterns £ M = . . . , S [— n, n) , giving 

J * = jf E - 1 (6°) 

(the factor TV, as before, ensures a proper limit N — > oo later). In analogy with our solution of the 
Hopfield model we define the following averages over pattern variables: 

(SK]>£ = Jjm^ffo]. = (eh • • • ^f) G [— 7T, 7r] P 

We can write the Hamiltonian H(cj)) of (|34|) in the form 

H W = -^EE^M "^-^] = "Y E {< c (0) 2 +< s (0) 2 +m^ c (0) 2 +m^(0) 2 } 

Ui " <l! = ^E COS (^) COS (^) m cs(<t>) = ^E cos fe M ) sm ( ( ? :) i) ( 61 ) 

<c(0) = ^E sin (C) cos (^) m W) = ^E sin fe M ) sin (^) (62) 

i i 

The free energy per oscillator can now be written as 

F/iv = -^Liog |...|^ e ^) = _-Lio g j...j^ e WT.J^^? 
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with ★* G {cc, ss, cs, sc}. Upon introducing the notation m** = (m 1 ^, . . . , m**) we can again express 
the free energy in terms of the density of states V({m*+}) = (2ir)~ N f---fd<f> n** 5[m ick —m irk (<T)]: 

F/N = -^log(2vr) - -^log jj\d m ^ V{{m^})e^ N ^ m ** (63) 

Since p is finite, the leading contribution to the density of states (as N — > do), which will give us the 
entropy, can be calculated by writing the <5-functions in integral representation: 



1 1 f 

lim — log2?({m**}) = Jim —log /TT [dec** e l 



-i J2i J2j x cc cos (?f ) «»(0i)+a& cos(£f ) sin(&)+a£ e sin(£f ) cos(^ 4 )+a^ s sin(£f ) siu(<fo)] 

(2tt)^ 



extr {»**} | « n**-™** + ( io g y ^ e 



— « ^ [^c cos(^) cos^+rr^ cos(^) sin(0)+x^ c sin(^) cos(</>)+£ss sin(^) sin(</>)] 



The relevant extremum is purely imaginary so we put x++ = if3y++ (see also our previous discussion 
for the Hopfield model) and, upon inserting the density of states into our original expression for the 
free energy per oscillator, arrive at 

Jim^F/TV = extr {m ^ y ^ } /({m**, y**}) 
1 1 

/({m**, y**}) = — - log(2vr) - - ^ m*^ + ^ y**m** 
P z ** 

— — doff f — e 13 cos (Sm) cos (<M+2/cs cos(£ M ) sin((/>)+j/^ c sin(£ M ) cos(c£)+j/£ s sin(g M ) sin(0)] v 

/T & J 2vr '£ 

Taking derivatives with respect to the order parameters m** gives us y^ = m**, with which we can 
eliminate the y++. Derivation with respect to the m** subsequently gives the saddle-point equations 

fdrh rnsUlp^ cos [ <?i ]S I ,[ m cc cos [^]+ m L sin [^]]+/ 3sin M XU m cs cos [^]+ m L sin[£„]] 

m M = / cos rt ] J " y w>m c v / 64 \ 

CC N LSMJ e ^cosHX;j^ c cos[^]+m^sin[^]]+/3sin[</.]X;jm^co S [^]+m^sin[^]] '€ V 7 

fdo!) sinf(*le /3 COS ^ S„[ m c C cos K"]+ m sc sin[£„]]+/?sii#] 5Z v [m^ cos[£„]+mJ a sin[^]] 

Yfi^ = I coslf' 1 )t (65) 

cs X L ^ J e /3cos[^X;„[^ c co S [e„]+mJ cS in[^]]+/3rin[^X;„[m^co S [^]+m^siii[^]] 7 € v 7 

fdoi COsf<#le^ C ° S ^ 2„[ m cc cos[?„]+m£,. sin[£„]]+/?sin[<£] X^[ m cs cos K"]+ m L sin[^]] 

77i^ = / sinff 1 - — )t (66) 

&C X LSAtJ e /3co S [</.]^J m - c cos[^]+ m - c sin[^]]+/3 S inM^Jm- s cos[^]+m- s sin[^]] 7 € V ; 

fc/(/> single 18 cos '^ S^[ m cc cos [^]+ m ^ sin [^]]+/ 3sin M2^^[ m cs cos [^]+ m ^ si n[^ 

m M — / s i n [f 1 J__L_!__! \ t 



The equilibrium values of the observables m**, as defined in ( pl| , p2; ), are now given by the solution of 
the coupled equations (|64|-|67|) which minimises 



/({m**}) = -Y^m^ - -(log ( d(j) e ^ cos MEJ< c cos [^]+ m sc s M€^^ 

(68) 
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We can confirm that the relevant saddle-point must be a minimum by inspecting the [3 = limit 
(infinite noise levels): lim^o f({ m **}) = \ E** m l* ~ \ l°g( 2vr )- 

Analysis of Order Parameter Equations: Pure States. From now on we will restrict our analysis to 
phase pattern components which have all been drawn independently at random from [— tt, tt], with 
uniform probability density, so that (g[$])^ = (27r)~ p J^ w . . .J^ n d£ g[£\. At [3 = (T = oo) one finds 
only the trivial state = 0. It can be shown that there will be no discontinuous transitions to a non- 
trivial state as the noise level (temperature) is reduced. The continuous ones follow upon expansion 
of the equations (64-37) for small {m**}, which is found to give (for each \x and each combination **): 

m^ = \(3m^ + 0({ml}) 

Thus a continuous transition to recall states occurs at T = \. Full classification of all solutions of 



(64-67) is ruled out. Here we will restrict ourselves to the most relevant ones, such as the pure states, 
where m** = m^SaX (for some pattern label A). Here the oscillator phases are correlated with only 
one of the stored phase patterns (if at all). Insertion into the above expression for /({m**}) shows 
that for such solutions we have to minimise 

1 2 _ jj_ f d£ i f j i /3cos[</)][m cc cos[5]+m sc sin[5]]+/3sin[(/)][m cs cos[g]+m ss sin[, 



/({to**}) = - ^ — J lo g y# e ^ CM M[ m ccCosK]+m S csin[f]]+/3sm[0][m cs cos[^+m ss smK]] ^gg-j 

We anticipate solutions corresponding to the (partial) recall of the stored phase pattern £ A or its mirror 
image (modulo overall phase shifts £j — * £i + <5, under which the synapses are obviously invariant). 
Insertion into (54-67) of the state fa = $ + 5 gives (m cc ,m sc ,m cs ,m ss ) = | (cos 5,— sin 6, sin 5, cos S). 



Similarly, insertion into (|64|-|67D of fa =—£i+S gives (m cc ,m sc ,m cs ,m ss ) = i (cos 5, sin S, sin 5,— cos S). 
Thus we can identify retrieval states as those solutions which are of the form 

(i) retrieval of £ A : (m cc , m sc , m cs ,m ss ) = m(cos 5,— sin 5, sin 6, cos 6) 

(ii) retrieval of — £ A : {m cc , m sc , m cs , m ss ) = m(cos 5, sin 5, sin 5,— cos S) 

with full recall corresponding to m = |. Insertion into the saddle-point equations and into (|69|), 
followed by an appropriate shift of the integration variable 0, shows that the free energy is independent 
of 5 (so the above two ansatze solve the saddle-point equations for any 8) and that 



1 cos^e^™ 008 !' 

2 Jd(j) e /3mcos[0] ' " " "■ (3 



III — — ; . f( m ) = m2 — -q log J d(j) C 



f)m cos [i 



Expansion in powers of to, using log(l+-z) = z — ^z 2 + 0(z 3 ), reveals that non-zero minima m indeed 
bifurcate continuously at T = (3~ l = \: 

/(m) + 4log[27r] = (l-i/?)m 2 + i-/? 3 m 4 + 0(m 6 ) (70) 
[3 4 64 

Retrieval states are obviously not the only pure states that solve the saddle-point equations. The 
function (^) is invariant under the following discrete (non-commuting) transformations: 

I. (to cc , to sc , to cs , to ss ) > {rfi cc ^m sc ^ m CSl to ss ) 

II . ij^cci Tflsci ^Y^csi ^ss) ^ ij^cs ) Tr^ss j ^CC5 ^sc) 
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We expect these to induce solutions with specific symmetries. In particular we anticipate the following 
symmetric and anti-symmetric states: 

(m) symmetric under I : (m cc ,m sc ,m cs ,m ss ) = ^/2m{cos 5, sin S, 0, 0) 

(iv) antisymmetric under I : (m cc , m sc , m cs , m ss ) = y2m(0, 0, cos 5, sin 6) 

(v) symmetric under II : (rn cc ,m sc , m cs ,m ss ) = m (cos 5, sin S, cos S, sin 5) 

(vi) antisymmetric under II : (m cc , m sc , m cs ,m ss ) = m(cos 5, sin 5, — cos 5, — sin S) 



Insertion into the saddle-point equations and into ( |69| ) shows in all four cases the parameter 5 is 
arbitrary and that always 



m 1 /« cos|{1 J#c^~^ = i , ^^w^, 

Expansion in powers of m reveals that non-zero solutions m here again bifurcate continuously at 



f(m) + 1 logpvr] = (l-^/5)m 2 + ^^ 3 m 4 + 0(m 6 ) (71) 



r = J= 



However, comparison with ( [TOD shows that the free energy of the pure recall states is lower. Thus the 
system will prefer the recall states over the above solutions with specific symmetries. 

Note, finally, that the free energy and the order parameter equation for the pure recall states can 
be written in terms of Bessel functions as follows: 

2 Io{pm) (3 

The behaviour of these equations and the observable m for different noise levels is shown in figure 
^. One easily proves that \m\ < ^, and that lim^^oo m = \. Following the transition to a state 
with partial recall of a stored phase pattern at T = \, further reduction of the noise level T gives a 
monotonic increase of retrieval quality until retrieval is perfect at T = 0. 



5 Networks with Gaussian Distributed Synapses 

The type of analysis presented so far to deal with attractor networks breaks down if the number of 
patterns stored p no longer remains finite for N — ► oo, but scales as p = aN (a > 0). Expressions 
such as (4^,49) can no longer be evaluated by saddle-point methods, since the dimension of the 
integral diverges at the same time as the exponent of the integrand. The number of local minima 
(ergodic components) of Hamiltonians such as (3C,32) will diverge and we will encounter phenomena 



reminiscent of complex disordered magnetic systems, i.e. spin-glasses. As a consequence we will need 
corresponding methods of analysis, in the present case: replica theory. 

5.1 Replica Analysis 

Replica Calculation of the Disorder- Averaged Free Energy. As an introduction to the replica technique 
we will first discuss the equilibrium solution of a recurrent neural network model with binary neurons 
o~i G {—1, 1} in which a single pattern £ = (£i, . . . , £tv) £ {— 1, 1}^ has been stored (via a Hebbian-type 
recipe) on a background of zero-average Gaussian synapses (equivalent to the SK model, £L7|) : 
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in which Jq > measures the embedding strength of the pattern, and the Z\j (i < j) are independent 
Gaussian random variables. We denote averaging over their distribution by 7TT (the factors in ( [72] ) 
involving N ensure appropriate scaling and statistical relevance of the two terms, and as always 
Ju = 0). Here the Hamiltonian H (|30|), corresponding to sequential dynamics (H), becomes 



H(a) 



^NJ m 2 ((r) + \j - -^^aicrj 

2 1 v iV iKj 



(73) 



with the overlap m(cr) = J2 k cr^k which measures pattern recall quality. We clearly cannot calculate 
the free energy for every given realization of the synapses, furthermore it is to be expected that for 
N — > oo macroscopic observables like the free energy per neuron and the overlap m only depend on 
the statistics of the synapses, not on their specific values. We therefore average the free energy over 
the disorder distribution and concentrate on 



lim logZ, 



E< 



-0H(cr) 



The disorder average is transformed into an average of powers of Z, with the identity 

Z"-l 



logZ 



lim — 

n— >o n 



or, equivalently, 



logZ 



1 



lim — log Z n 

n^O n 



(74) 



(75) 



The so-called 'replica trick' consists in evaluating the averages Z n for integer values of n, and taking 
the limit n — > afterwards, under the assumption that the resulting expression is correct for non- 
integer values of n as well. The integer powers of Z are written as a product of terms, each of which 
can be interpreted as an equivalent copy, or 'replica' of the original system. The disorder-averaged 
free energy now becomes 

F = - lim — logZ" = - lim — log V e _/3 £Li H((ra) 
rwo Bn rwo Bn f-^ 

From now Roman indices will refer to sites, i.e. % = 1 . . . N, whereas Greek indices will refer to replicas, 

1 12 

i.e. a = 1 . . . n. We introduce a short-hand for the Gaussian measure, Dz = (2ir)~2e~2 z dz, and we 

1 2 i 1 

will repeatedly use the identity J Dz e xz = e^ x . Upon insertion of the Hamiltonian (|73|) we obtain 



F = -iiVlog2 - lim^oO^r 1 log : ( f^Ea^f ? j| 



i<j 



jDz 



}{0-} 



4iVlog2- lim n ^o(/9^) _1 log (e™ ^« ■^ a ? a f+ P 4N E« 7 Ei^ 



>{<r»} 



We now complete the sums over sites in this expression, 



E 



°i a 3 U i a i 



The averaging over the neuron states {cr a } in our expression for F will now factorize nicely if we insert 
appropriate 5-functions (in their integral representations) to isolate the relevant terms, using 



1 



dq \{5 

a0 



Qa/3- 



1 

N 



E<^ 





"iV" 


7 




.2tt_ 





J dqdq e iN ^ ia0 [ 9q/3 ~ * ^ a " a ^ 
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1 



/ dm JJ ( 





" N~ 


7 




.2tt. 





Jdmdm e^Ea^K-wEi^] 



The integrations are over the n x n matrices q and q and over the n- vectors m and m. After inserting 
these integrals we obtain 



lim F/N = — — log 2— lim lim — log ■ 

AT^oo ' (3 N^oo n-0 j3Nn 



N 



2ir 



n +n 



J dqdqdmdm e ~h^-\ n ^ 2 J 2 



N 



x e 



>{<T«} 



The neuronal averages factorise and are therefore reduced to single-site ones. A simple transformation 
Ci - * for all i eliminates the pattern components £j from our equations, and the remaining averages 
involve only one n-replicated neuron (a±, . . . ,a n ). Finally one assumes that the two limits n — ► and 
N —* oo commute. This allows us to evaluate the integral with the steepest-descent method: 



lim lim — log / dx e N *W = lim i im J_i oge JV «tr*+... = lim I extr$ 
N^oo n-+0 Nn J n— >Q N^oo Nn n— »0 n 

The result of these manipulations is 



(76) 



lim F/N = lim extr f(q,m;q,m) 

N—>oo n— >0 



(77) 



f(q,m;q,rh) 



1 1 

-5log2- — 

p pn 



log(e _i E a7 &n°'« CT 7- i Ea A « '«} tT 



7 2 



(78) 



ay a a ay 

Variation of the parameters {q a /3} and {m a } allows us to eliminate immediately the conjugate param- 
eters {cjap} and {rh a }, since it leads to the saddle-point requirements 

1 



la/3 = -jiP J Qa/3 



m a = ipjom a 

Upon elimination of {q a /3,ma} according to (|79|) the result (77]78|) is simplified to 



lim F/N = lim extr f(q,m) 

N^oo n— >0 



1 i o , @ J 2 2 , Jo 2 1 i / '- 



/3 2j2 E Q7 

Qa70" a (T 7 +/3 Jo m a <J a y 



(79) 



(80) 



(81) 



Variation of the remaining parameters {(fo/j} and {m a } gives the final saddle-point equations 

((X\a J2 E Q7 i<*i°<*°"<+P J o £ Q m " CTa ^ 



( e ^P 2j2 E Q7 ? Q7 o- Q o- 7 +/3Jo m a a a ^ 
(a\e^ 2j2 E^ 7 9a 7 ff Q CT 7 +/3J ^ a ™<xCTa \ 



i/3 2 J 2 £7 g a7 cr a o- 7 +/3J m a cr Q 



(82) 
(83) 
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The diagonal elements are always q aa = 1. For high noise levels, (3 — * 0, we obtain the trivial result 

Assuming a continuous transition to a non-trivial state as the noise level is lowered, we can expand the 
saddle-point equations (^J8*|) in powers of q and m and look for bifurcations, which gives (A 7^ p): 

q\ P = P 2 J 2 Q\p + 0(q, m) 2 m x = (3J m x + 0(q, mf 

Therefore we expect transitions either at T = Jo (if Jo > J) or at T = J (if J > Jo). The remaining 
program is: find the saddle-point (q, m) for T < max{ Jo, J} which for integer n minimises /, determine 
the corresponding minimum as a function of n, and finally take the limit n — > 0. This is in fact the 
most complicated part of the procedure. 



5.2 Replica-Symmetric Solution and AT-Instability 

Physical Interpretation of Saddle Points. To obtain a guide in how to select saddle-points we now turn 
to a different (but equivalent) version of the replica trick (f75|), which allows us to attach a physical 
meaning to the saddle-points (m, q). This version transforms averages over a given measure W: 

v ' (T (J (T\..(T n a =l 

-1 n n 

= h-E E ^(- 7 )I1^(0 (84) 

7=1 (j '...O"" a=l 

The trick again consists in evaluating this quantity for integer n, whereas the limit refers to non-integer 
n. We use (B4j) to write the distribution P(m) of overlaps in equilibrium as 



P M = v p - m a) = l™onE E *N-^2^^iUl e ^ ^ 

z^er 7 o" 1 ...<r n i a 

If we average this distribution over the disorder, we find identical expressions to those encountered in 
evaluating the disorder averaged free energy. By inserting the same delta-functions we arrive at the 
steepest descend integration (ff7|) and find 

P(m) = lim — 5 [m— m 7 ] (85) 

where {m 7 } refers to the relevant solution of ( p^j8^) . Similarly we can imagine two systems <x and a' 
with identical synapses {Jij}, both in thermal equilibrium. We now use (|84| ) to rewrite the distribution 
P (q) for the mutual overlap between the microstates of the two systems 



ZcT.tr' e - W)-W0 

v y X^y (T 1 ...tT n i a 



P(?) 



Averaging over the disorder again leads to the steepest descend integration (|77|) and we find 

Pjq) = lim — E^[?- 9A 7 ] (86) 
n-*o n(n— 1 , , 

v y A^7 
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where {^7} refers to the relevant solution of (B2,p3|). We can now partly interpret the saddle-points 
(m,q), since the shape of P(q) and P(m) gives direct information on the structure of phase space 
with respect to ergodicity. The crucial observation is that for an ergodic system one always has 

P(m)=6[m-±Y,^U ^) = %-^E^)eql ( 87 ) 



1 



If, on the other hand, there are L ergodic components in our system, each of which corresponding 
to a pure Gibbs state with microstate probabilities proportional to exp(—[3H) and thermal averages 
{■ ■ -)e, and if we denote the probability of finding the system in component t by We, we find 

L 1 L 1 

P{rn) =Y,Wi 6[m--Y,Zi(°i)t] P (l) = E <%"^E^>^M 

l=\ i i,l'=l i 

For ergodic systems both P{m) and P(q) are <5-functions, for systems with a finite number of ergodic 
components they are finite sums of 5-functions. A diverging number of ergodic components generally 
leads to distributions with continuous pieces. If we combine this interpretation with our results (|85|,86) 
we find that ergodicity is equivalent to the relevant saddle-point being of the form: 

(lap = S a /3 + q [1 - S a p] m a = m (88) 

which is called the 'replica symmetry' (RS) ansatz. The meaning of m and q is deduced from 
(taking into account the transformation cr^ — > £j<Tj we performed along the way): 

1 ^ Z , i 1 



m = T?E&^)eq 9=TtE^ 



Replica Symmetric Solution. Having saddle-points of the simple form 



leads to an enormous 



simplification in our calculations. Insertion of (|88| ) as an ansatz into the equations (pi , 82 ]83|) gives 
f(q,m) = -Il og 2- i/3J 2 (l-g) 2 + ij m 2 - i-log (eS^^E.^'+^E^-),, + 0(n) 



m 



We linearise the terms \%2 a era] 2 by introducing a Gaussian integral, and perform the average over the 
remaining neurons. The solutions m and q turn out to be well defined for n — * so we can take the 
limit: 



lim f(q,m) 



n-+0 



±\ og 2 - \pj\l-qf + l -J Q m 2 



q = J Dz tanh 2 \pJom-\-f3Jz^/q\ 
Writing the equation for m in integral form gives 



m 



Dz log cosh [pjom+pjz^/q] 
Dz tanh[(3J m+pjzy/q\ (90) 



m = flJom / dX 



Dz tanh 2 [\(5Jom+()Jz^/q\ 
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From this expression, in combination with (90), we conclude: 

T > Jq : m = T > Jq and T > J : m = g = 

Linearisation of ( |90| ) for small g and m shows the following continuous bifurcations: 

at from to 

Jo > J : T = Jo m = g = q > 

Jo < J : T = J m = q = m = 0, g > 

T < max{ J , J} : T = Jq[1 — q] m = 0, q > g>0 



Solving numerically equations T = Jq[1— q] and ( |90|) leads to the phase diagram shown in figure 10 



Breaking of Replica Symmetry: the AT Instability. If for the replica symmetric solution we calculate 
the entropy S = (3 dF/d(3 numerically, we find that for small temperatures it becomes negative. This 
is not possible. Firstly, straightforward differentiation shows dS/d(3 = (3[{H)1 q — (H 2 ) cq ] < 0, so S 
increases with the noise level T. Let us now write H(cr) = Hq + H(cr), where Hq is the ground state 
energy and H(cr) > (zero only for ground state configurations, the number of which we denote by 
N > 1). We now find 

limS= lim I log^e-^) + P(H) cq \ = lim [log £ e~^^ + (3(H) cq ] > logiV 

We conclude that S > for all T. At small temperatures the RS ansatz (^) is apparently incorrect in 
that it no longer corresponds to the minimum of f(q,m) (|8l|). If saddle-points without replica sym- 
metry bifurcate continuously from the RS one, we can locate the occurrence of this 'replica symmetry 
breaking' (RSB) by studying the effect on f(q,m) of small fluctuations around the RS solution. It 
was shown [|l9| that the 'dangerous' fluctuations are of the form 

q a /3 8 a/ 3 + q [1 - 5 al3 ] + 7] a p, ^2va/3 = Ma (91) 



in which q is the solution of fl90|) and r) a p = rjp a . We now calculate the resulting change in f(q,m), 
away from the RS value /(<Zrs> m Rs)j the leading order of which is quadratic in the fluctuations {ri a p} 
since the RS solution of (|90|) is a saddle-point: 



f(q,m) - f(q RS ,m RS ) = J2 vlj ~ J2 J2 VayV P xG a7p x 



with 

l„ „ „ „ hqP 2 J 2 \Y o- a ] 2 +/3mJ y a, 
"orypA — — 



Because of the index permutation symmetry in the above average we can write for a ^ 7 and p 7^ A: 

C a7p A = ^ap&yA + ^aA&yp + G4 [1 — <5 ap ] [l - ^a] [1~ $a\] [1 — <5 7 p] 
+ G 2 [1-(5 7 a]+5 7 A [l-^ap]+^aA [l-^ 7 p]+5 7 p [1 — 

with 

/Dz tanh^ [/?J m+/3Jz^/g] cosh™ [(3J m+f3Jz^/q\ 



Gi 



JDz cosh n [/3J m+f3Jzy/q] 
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Only terms which involve precisely two ^-functions can contribute, because of the requirements a 7^ 7, 
p 7^ A and Vap = 0. As a result: 



f(q,m) - f(q RS ,m RS ) = P — [l-{3 2 J 2 (1-2G 2 + G 4 )] £ n% 



The condition for the RS solution to minimise f(q,m), if compared to the so called 'replicon' fluctu- 
ations (91), is therefore 

1 > I3 2 J 2 lim (1 - 2G 2 + GA 

n— »0 

After taking the limit in the expressions Gg this condition can be written as 

1 > /3 2 J 2 Jdz cosh" 4 [8J m+f3Jz^\ (92) 

The so-called AT line in the phase diagram where this condition ceases to be met, indicates a continuous 
transition to a complex 'spin-glass' state where ergodicity is broken (i.e. the distribution P(q) (|86| ) is 
no longer a 5-function). It is shown in figure ^ as a dashed line for Jo/ J > 1, and coincides with the 
line T/J = 1 for J < 1. 



6 The Hopfield Model Near Saturation 

6.1 Replica Analysis 

We now turn to the Hopfield model with an extensive number of stored patterns, i.e. p = aN in (I4 1 



We can still write the free energy in the form (48), but this will not be of help since here it involves 



integrals over an extensive number of variables, so that steepest descent integration does not apply. 
Instead, following the approach of the previous model (72), we assume [18| that we can average the 



free energy over the distribution of the patterns, with help of the replica-trick (75): 



lim J- log y e-^ELx^) 



+0 Bn 



CT\.XT r 



Greek indices will denote either replica labels or pattern labels (it will be clear from the context), i.e. 
a, (5 = 1, . . . , n and fi, v = l,...,p. The p x N pattern components {£f } are assumed to be drawn 
independently at random from { — 1, 1}. 



Replica Calculation of the Disorder-Averaged Free Energy. We first add to the Hamiltonian of ( |30| ) 
a finite number I of generating terms, that will allow us to obtain expectation values of the overlap 
order parameters ( p|) by differentiation of the free energy (since all patterns are equivalent in the 
calculation we may choose these I nominated patterns arbitrarily): 

I ~ 
H -> H + V \ V ai$ (m»>eq = ^F/N (93) 



We know how to deal with a finite number of overlaps and corresponding patterns, therefore we average 
only over the disorder that is responsible for the complications: the patterns . . . , £ p } (as in the 

previous section we denote this disorder-averaging by 7TT ) . Upon inserting the extended Hamiltonian 
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into the replica-expression for the free energy, and assuming that the order of the limits N — > oo and 
n — > can be interchanged, we obtain for large N: 



F/N = l a -^ log2 -^pk ]ogu 



We linearise the \i < I quadratic term using the identity (|5q ) , leading to n x £ Gaussian integrals with 
Dm = (Dm\, . . . , Dm^): 



F/JV = ia-ilog2-Um^l- log /Dm, 



E^E^f^fv 7 ^-/^ 



g2W Ea E M >£ [Ei CT f C] Wg-al 



Anticipating that only terms exponential in the system size N will retain statistical relevance in the 
limit N — > oo, we rescale the n x I integration variables m according torn-* m^J f3N : 



F/iV = ^-^ log2 -^o^ log 



(3N 



2tt 



J dm e -^ Nm2 



(e 



p E M <, E„ £, K-Am] JE a E„>< K< ]' 



)«t«} (94) 



Next we turn to the disorder average, where we again linearise the exponent containing the pattern 
components using the identity (pq), with Dz = (Dzx, . . . , Dz n ): 

p-i 



^E„ (#) 3 Ei^6 



Dz ^Qcosh 



/dz e (£)*E« 
' E a «^^E i ^f+0(^) v " 



e 2iV A^ a p 



(95) 



We are now as in the previous case led to introducing the replica order parameters q a a: 



1= dq ]J5 

a/3 



1 



7a/3- 



AT 



Oi a, 





'N' 


"7 




.2vr_ 





Inserting (95) and the above identities into ( |94| ) and assuming that the limits iV — > oo and n — > 
commute gives: 

_ 1 1 1 r N\iJ2 a0 g amcl0 -^m^+alogfDZ e ^E a/3 *«» 

lim D/iV = —a— —log 2— lim lim — log / dmdqdq e L 

tv^oo ' 2/3 Af^oo n->o /5iVn & i y y 

x E M <f E a Ei K~ A mH E Qf 3 9«/3 E, ff "°fL ff 

The n-dimensional Gaussian integral over z factorises in the standard way after appropriate rotation 
of the integration variables z, with the result: 



log J Dz e* ZaZl3Q ^ = - i log det [1 - 



in which 1 denotes the n x n identity matrix. The neuron averages factorise and are reduced to 
single-site ones over the n-replicated neuron a = (<ti, . . . , a n ): 



— .11 1 , /" „ JV 
lim D/iV = —a log 2 — lim lim log / dmdqdq e 

tv^oo ' 2 (3 N->oon^of3Nn B J HH 



J2 a 0^1 a /3- iPm? - 1 a log det [1-/3C?] 
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and we arrive at integrals that can be evaluated by steepest descent, following the manipulations (|T 
If we denote averages over the remaining I patterns in the familiar way 

€ = &,...,&) <*(0> € = 2-^ £ *(0 

£e{-i,ip 

we can write the final result in the form 



lim F/N = lim extr f(m,q,q) 

N— too n— >0 



(96) 



f{m,q,q) = \a-±lo g 2-±- 



(log(e 



QapQap - \ fi m2 ~ ^alogdet [l-/3q] 

a/3 

Having arrived at a saddle-point problem we now first identify the expectation values of the overlaps 
with (]93| ) (note: extremisation with respect to the saddle-point variables and differentiation with 
respect to A commute): 

d 

(m M (<T))eq = lim lim — — extr f(m,q,q) 

n->0 X^Q dX/j, 



lim n (^ 

n— >0 



(97) 



which is to be evaluated in the A = saddle-point. Having served their purpose, the generating fields 
can be set to zero and we can restrict ourselves to the A = saddle-point problem: 



f{m, q, q) = -a - - log 2 - ^~ 

2 p pn 



'Y2,Q<*P ( l<xP ~ \ fi m2 - ^alogdet [l-(3q] 



ai3 



+ (log(e' "<'i t " a " " -)a)£ 
Variation of the parameters {m^, q a p, q a p} gives the saddle-point equations: 



9Ap = ( 



(cr A0 - / E M <« E Q E Q/ 3 V<*0<T a <Tg^ ^ 

(/E M <f Ea <T «^ m « - ' i Ea/3 9«/» t7 «' T |8^ . ' 

9An = -^ap — i : : 

yAp 2 ^ e -iz.[i-/Jg]z 



furthermore, 



(m M (<r))eq = lim - XI m » 



(98) 

(99) 

(100) 
(101) 

(102) 
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replaces the identification (|97|), As expected, one always has q aa = 1. The diagonal elements q aa drop 
out of ( p9 , |100 ), their values are simply given as functions of the remaining parameters by (101). 



Physical Interpretation of Saddle Points. We proceed along the lines of the Gaussian model (72). 
If we apply the alternative version (|84|) of the replica trick to the Hopfield model, we can write the 
distribution of the i overlaps m = (mi, . . . , mi) in equilibrium as 



7 cr\..cr n 



with £j = (£*,... 3 £|). Averaging this distribution over the disorder leads to expressions identical 
to those encountered in evaluating the disorder averaged free energy. By inserting the same delta- 
functions we arrive at the saddle-point integration (96,98|) and find 



P(m) 



lim — 5 [ 

n->0 n ^ 

7 



m—m 



7 J 



(103) 



where m 7 = (m* , . . . , m^) refers to the relevant solution of (9S, 100| , i01| ). 

Similarly we imagine two systems a and a' with identical realisation of the interactions {Jij}, both 
in thermal equilibrium, and use ( |84| ) to rewrite the distribution P(q) for the mutual overlap between 
the microstates of the two systems 



P(q) = lim 



o n{n- 



N 



-/3H{<r a ) 



Averaging over the disorder again leads to the steepest descend integration (96,|98) and we find 

1 



P(q) = lim 



n— >o n{n — 1) 



L<? - <?A 7 J 



(104) 



A^7 



where {^7} refers to the relevant solution of ( 99 , 1 0Q| , p^0l| ) . 

Finally we analyse the physical meaning of the conjugate parameters {q a p} for a ^ f3. We will do 
this in more detail, the analysis being rather specific for the Hopfield model and slightly different from 
the derivations above. Again we imagine two systems cr and <x' with identical interactions {Jij}, both 
in thermal equilibrium. We now use ( |84|) to evaluate the covariance of the overlaps corresponding to 
non-nominated patterns: 



1 



1 



- E (^E^M^E^ 



/eq 



(105) 



lim 



N-l/a 



n(n- 



' A^7 <jK..(T n 



AT ^ * ^ 



iV 



AT L^i U « 



rj e -/3H(o-«) 



(using the equivalence of all such patterns). We next perform the same manipulations as in calculating 
the free energy. Here the disorder average involves 



1 



E^f 



{ [dz e (4) 4 £„ - £, <*& 1" ' 1 / g£ g2 e (6)* £ Q - Ej f e» 



' 2l) q Eu>< [Si °T £f ] 
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p-l-1 



(after partial integration). We finally obtain an expression which involves the surface (|98|): 



1 



- lim 

p n(n- 



/ dmdqdq 



— > lim 



-%z-[i-f>q\z 



Jdz e -^z.[i-pq]z 



e -PnNf(m,q,q) 



X^p 



jdmdqdq e -PnNf(m,q,q) 



The normalisation of the above integral over {m, q, q} follows from using the replica procedure to 
rewrite unity. The integration being dominated by the minima of /, we can use the saddle-point 
equations (|101|) to arrive at 



lim 



1 



o n(n- 



l 



-ia[5 r 



(106) 



X^p 



The result (105, 1060 provides a physical interpretation of the order parameters {q a p}- 

Ergodicity implies that the distributions P(q) and P(m) are 5-functions, this is equivalent to the 
relevant saddle-point being of the form: 



m,, 



q~f P = 5 lp + q [1-5. 



- ,,1 q lp = -iaf3 2 [iM 7P + r [1S 1P }\ 



(107) 



which is the 'replica symmetry' (RS) ansatz for the Hopfield model. The RS form for {q a p} and 

is a direct consequence of the corresponding distributions being <5-functions, whereas the RS form for 



{qafi} subsequently follows from (101). The physical meaning of m p and q is 



cq 



<Ti. 



cq 



Before proceeding with a full analysis of the RS saddle-point equations, we finally make a few tentative 
statements on the phase diagram. For f3 = we obtain the trivial result q\ p = 5\ p , q\ p = 0, = 0. 
We can identify continuous bifurcations to a non-trivial state by expanding the saddle-point equations 
in first order in the relevant parameters: 



f3m£ + ... 



q\ P 



-2iq Xp + ... (X^p), 



q\ P 



1 ia(3 
21^/3 



ft 



:Qx P [IS. 



x P \ 



+ 



1-/3 



qx P + - 



Thus we expect a continuous 



Combining the equations for q and q gives q\ p = a 
transition at T = 1+y/a from the trivial state to an ordered state where q\ p ^ 0, but still (m /J ) eq = 
(a spin-glass state). 



6.2 Replica Symmetric Solution and AT-Instability 

The symmetry of the ansatz (|107|) for the saddle-point allows us to diagonalise the matrix A = l—(3q 
which we encountered in the saddle-point problem, A a p = [1— (3(1— q)]S a p — /3q: 

eigenspace : eigenvalue : multiplicity : 

x = (!,...,!) l-(3(l-q)-(3qn 1 
£ a x Q = 1-/3(1-?) n-1 



so that 



logdetA = log [l-p(l-q)-pqn] + (n-1) log [l-/3(l-g)] 
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n 



log [!-/?(!-</)] 



(3q 



+ 0(n 2 ) 



l-0(l-q)_ 

Inserting the RS ansatz (|107| ) for the saddle-point into (|98|), utilising the above expression for the 
determinant and the short-hand m = (mi, . . . , mi), gives 



/(™RS,<?rs,<?rs) 



-- log 2 + -a [\+pr(l-q)\ + -m 2 + ^ 



1 

"/to 



(log(e 



log [1-/3(1-9)] 



+ 0(n) 



/3<7 



1-/3(1-9) 



We now linearise the squares in the neuron averages with fl56|) , subsequently average over the replicated 
neuron <r, use cosh"[x] = 1 + n log cosh [x] + 0(n 2 ), and take the limit n — > 0: 



lim F RS /iV= hm/(m RS ,Q RS ,g RS ) 

AT— >oo n— >0 



-m H — a 
2 2 



i +M i- 9 ) + Ii„ g| i- /3 (i- 5) ]- I -^_ y 



y log2cosh/3 [m • ^^y^ar]^ 



(108) 



The saddle-point equations for m, q and r can be obtained either by insertion of the RS ansatz (|107|) 
into (^,100,101) and subsequently taking the n —> limit, or by variation of the RS expression ( |108| ). 
The latter route is the fastest one. After performing partial integrations where appropriate we obtain 
the final result: 

m = (£ J Dz tanh (3 [m ■ £,+z^/ar] )^ (109) 



q = (J Dz tanh 2 /3 [m • $ + z-/ar])^ 



9 [1-/3(1-9)]" 



(110) 



By substitution of the equation for r into the remaining equations this set can easily be further reduced, 
should the need arise. In case of multiple solutions of (109,110) the relevant saddle-point is the one 
that minimises ( |108| ). Clearly for a = we recover our previous results ©H). 

Analysis of RS Order Parameter Equations and Phase Diagram. We first establish an upper bound 
for the temperature T = 1/(3 for non-trivial solutions of the set ( |109| , |110| ) to exist, by writing (109) in 
integral form: 

m ii — (£ " m ) J d\jDz 1 — tanh 2 (3 (A£ • m+z\/ar) 
from which we deduce 

= m 2 -l3{{£ ■ m) 2 j\\ J Dz [l-tanh 2 (3 (A£ • m+z^P)\)^ > m 2 -(3{{£ ■ m) 2 )^ = m 2 [1-/3] 

Therefore m = for T > 1. If T > 1 we obtain in turn from (110), using tanh 2 (x) < x 2 and < q < 1: 
q = or q < 1+yfa—T. We conclude that q = for T > 1 + y/a. Secondly, for the free energy ( |108| ) 
to be well defined we must require q > 1 — T. Linearisation of (10E.110) for small q and m shows the 
continuous bifurcations: 



a > : 
a = : 



at 

T= 1 + y/cl 
T = 1 



from 

m = 0, q 
m = 0, q 



to 

m = 0, q > 
m / 0, q > 



The upper bound T = l+y/a turns out to be the critical noise level indicating (for a > 0) a continuous 
transition to a spin-glass state, where there is no significant alignment of the neurons in the direction 
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of one particular pattern, but still a certain degree of local freezing. Since m = for T > 1 this 
spin- glass state persists at least down to T = 1. The quantitative details of the spin-glass state are 
obtained by inserting m = into ( |110[ ) (since (|10S ) is fulfilled automatically) . 

The impact on the saddle-point equations ( 109j , 110 ) of having a > 0, a smoothening of the hyper- 
bolic tangent by convolution with a Gaussian kernel, can be viewed as noise caused by interference 
between the attractors. The natural strategy for solving ( |109|jll0| ) is therefore to make an ansatz 
for the nominated overlaps m of the type ( |52| ) (the mixture states). Insertion of this ansatz into the 
saddle-point equations indeed leads to self-consistent solutions. One can solve numerically the remain- 
ing equations for the amplitudes of the mixture states and evaluate their stability by calculating the 
eigenvalues of the second derivative of f(m, q, q), in the same way as for a = 0. The calculations are 
just more involved. It then turns out that even mixtures are again unstable for any T and a, whereas 
odd mixtures can become locally stable for sufficiently small T and a. Among the mixture states, 
the pure states, where the vector m has only one nonzero component, are the first to stabilise as the 
temperature is lowered. These pure states, together with the spin-glass state (m = 0, q > 0), we will 
study in more detail. 

Let us first calculate the second derivatives of (108) and evaluate them in the spin-glass saddle- 
point. One finds, after elimination of r with ( [1101) : 

d 2 f/dm ll dq = 



&f/dm ll dm v = V [\-(3{l-q)} 



The (^+1) x matrix of second derivatives with respect to variation of (m,q), evaluated in the 

spin-glass saddle-point, thereby acquires a diagonal form 



d 2 f 



( 1-/3(1- 



V 



1-/3(1- 



d 2 f/dq 2 J 



and the eigenvalues can simply be read off. The Mold degenerate eigenvalue 1— 0(1— q) is always positive 



(otherwise ( 108 ) would not even exist), implying stability of the spin-glass state in the direction of 
the nominated patterns. The remaining eigenvalue measures the stability of the spin-glass state with 
respect to variation in the amplitude q. Below the critical noise level T = 1 + it turns out to be 
positive for the spin-glass solution of ( |110 ) with nonzero q. One important difference between the 
previously studied case a = and the present case a > is that there is now a m = spin-glass 
solution which is stable for all T < l + y'a. In terms of information processing this implies that for 
a > an initial state must have a certain non-zero overlap with a pattern to evoke a final state with 
m ^ 0, in order to avoid ending up in the m = spin-glass state. This is clearly consistent with the 
observations in figure |5[ In contrast, for a = 0, the state with m = is unstable, so any initial state 
will eventually lead to a final state with m ^ 0. 

Inserting the pure state ansatz m = m(l, 0, . . . , 0) into our RS equations gives 



m 



Dz tanh 



(5m + 



zf3J~aq 



1-/3(1- 



Dz tanh 2 



(5m + 



z(5Jaq 



l-P(l-q) 



(111) 



/ 



— m H — a 
2 2 



l + /3(l- g )(/3-2)l 
V ; [l-/3(l-g)] 2 1 V '\ 



Dz log 2 cosh 



(5m+- 



z0J~aq 



(3 J ~ "° \T"~ ' l-(3(l-q) 

(H2) 

If we solve the equations (p. 1 1|) numerically for different values of a, and calculate the corresponding 



'free energies' / (|112j ) for the pure states and the spin-glass state m = 0, we obtain figure 11. For 
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a > the nontrivial solution m for the amplitude of the pure state appears discontinously as the 
temperature is lowered, defining a critical temperature 7m (a). Once the pure state appears, it turns 
out to be locally stable (within the RS ansatz). Its 'free energy' /, however, remains larger than the 
one corresponding to the spin-glass state, until the temperature is further reduced to below a second 
critical temperature T c {a). For T < T c (a) the pure states are therefore the equilibrium states in the 
thermodynamics sense. 

By drawing these critical lines in the (a,T) plane, together with the line T g (a) = 1 + \/a which 
signals the second order transition from the paramagnetic to the spin-glass state, we obtain the RS 
phase diagram of the Hopfield model, depicted in figure [l^. Strictly speaking the line Tj\j would appear 
meaningless in the thermodynamic picture, only the saddle-point that minimises / being relevant. 
However, we have to keep in mind the physics behind the formalism. The occurrence of multiple 
locally stable saddle-points is the manifestation of ergodicity breaking in the limit N —* oo. The 
thermodynamic analysis, based on ergodicity, therefore applies only within a single ergodic component. 
Each locally stable saddle-point is indeed relevant for appropriate initial conditions and time-scales. 

Zero Temperature, Storage Capacity. The storage capacity a c of the Hopfield model is defined as the 
largest a for which locally stable pure states exist. If for the moment we neglect the low temperature 
re-entrance peculiarities in the phase diagram (|l^) to which we will come back later, the critical 
temperature Tm(oc), where the pure states appear decreases monotonically with a, and the storage 
capacity is reached for T = 0. Before we can put T — > in ( 11 1| ) , however, we will have to rewrite 
these equations in terms of quantities with well defined T — ► limits, since q — > 1. A suitable quantity 
is C = (3(1 — q), which obeys < C < 1 for the free energy ( [L08|) to exist. The saddle-point equations 
can now be written in the form 



m 



Dz tanh 



(3m+ 



z(3Jaq 



1-C 



d 

C = — I Dz tanh 
dm 



(3m + 



z(3Jaq 



1-C 



in which the limit T —* simply corresponds to tanh(/?rc) 
the limit we perform the Gaussian integral: 



m = erf 



m(l-C) 



C = (1-C) 



sgn(x) and q — ► 1. After having taken 



-m 2 (l-C) 2 /2a 



This set can be reduced to a single transcendental equation by introducing x = m(l — C)/\f2a: 

2x _.,.2 



x\[2a~ = Fix) 



F(x) = erf(x) 



7T 



(113) 



Equation ( |113j ) is solved numerically (see figure ^). Since F(x) is anti-symmetric, solutions come in 
pairs (x, —x) (reflecting the symmetry of the Hamiltonian of the system with respect to an overall 
state-flip <t — > —tr). For a < a c ~ 0.138 there indeed exist pure state solutions i / 0. For a > a c 
there is only the spin-glass solution x = 0. Given a solution x of ( |113j ), the zero temperature values 
for the order parameters follow from 



lim m 



erf [x] 



lim C 

T->0 




with which in turn we can take the zero temperature limit in our expression ( |1 12|) for the free energy: 

xy/n erf [x] +e~~ 3 



lim f = -erf 2 [xl + — e x2 - 



2 

7T 



„2 I air 
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Comparison of the values for lim-p^o / thus obtained, for the pure state m > and the spin-glass 
state m = leads to figure 13 , which clearly shows that for sufficiently small a the pure states are the 
true ground states of the system. 



The AT- Instability. As in the case of the Gaussian model flT^), the above RS solution again generates 
negative entropies at sufficiently low temperatures, indicating that replica-symmetry must be broken. 
We can locate continuous replica symmetry breaking by studying the effect on f(m, q, q) (^) of small 
replicon [19] fluctuations around the RS solution: 

q a /3 -> SaP + q [l-5 a p] + Va/3, Vaf3 = VPa Vaa = 



VaP = 



(114) 



The variation of q induces a similar variation in the conjugate parameters q through equation (|101|): 



1 



7a/3 



with 



9a/3~/5 



ia(3 2 [R5 a/3 + r [l-5 a/3 ] + fj a p] 



jdz z a zpz 1 z s e 2 z \ 1 -^1-bs\ z 



1 



jdz e-^'I 1 -^] 2 ^ jdz e -3*-M«lRs]* 

Wick's theorem (see e.g. [||) can now be used to write everything in terms of second moments of the 
Gaussian integrals only: 

9a/3jS = 9a0~i& + 9a-y9/3S + 9a590-) 

with which we can express the replicon variation in q, using the symmetry of {f]ai3} and the saddle- 
point equation ( |101| ), as 



J2 %<5 \9. 



'a/3-yS — 9a/39-yS\ 



jdz z a z p e-^ z \ l -^\ z 



(115) 



7(5 



since only those terms can contribute which involve precisely two 5-symbols, due to J2 a Va/3 = 0. We 
can now calculate the change in f(m,q,q), away from the RS value /(wirs, <7 R g, 9rs)j the leading 
order of which must be quadratic in the fluctuations {r] a /3} since the RS solution is a saddle-point: 



/(m RS , q, q) - / (m RS , q RS , Q RS ) 



1 

0n 



1 , det [I-/3(q Rq +7?)] m rA 



1 „ / e ^-rn RS J2 a ^^i(T-[q RS +^ia^f]]CT ) 

+ -a/? 2 Tr [f,.r,+f,.q RS ] - (log {€ ; g£ m v ; lCT d a 

2 / e P5- m Rs2^ a CTc " _l<T ' < J , R.s <T \ (J . 



(116) 



Evaluating ( |116| ) is simplified by the fact that the matrices q R g and rj commute, which is a direct 
consequence of the properties (114) of the replicon fluctuations and the form of the replica-symmetric 
saddle-point. If we define the nxn matrix P as the projection onto the vector (1, . . . , 1), we have 



P. 



af3 



n 



P/q = rj.P = 



[I-#Zr 



1 



-H 



l-q)I + nqP q RS .r/ = r/.q RS = (l-q)ri (117) 
0nq „ 



'RSJ l-^l-,)" [l-p {1 - q )-f3nq][l-P(l-q)Y 

We can now simply expand the relevant terms, using the identity logdetM = Tr logM: 

det [1 - /% RS + rj)] 



log' 



det [I - (3q RS \ 



Trlog l-Prjll-Pq^]- 1 
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Tr { -f3 V [1 - Pq^]- 1 - l -f3 2 [r, [1 - fiq^}- 1 } 2 \ + 0( V ^ 



1 



2 



r Tr rj 2 + 0{rf ) 



- ., - , (118) 
2[l-/3(l- (? )] 2 

Finally we address the remaining term in ( |116| ), again using the RS saddle-point equations ( 1091 ,110) 
where appropriate: 



(lot 



l + \a(3 2 cr ■ i)a + la 2 l3 4 (a ■ i)cr) 2 + . 



with 



H a /3~y5 — ( 



■ ^a/3 2 Tr[T).q RS ] + ^a 2 /? 4 Vapfj-ysiGap-yS-Hap-ys] + ■■■ 

a/3-fS 



(119) 



^tm ns Y /a ^(r-q RS a^ ^■m RS Y /a ^-i(T-q IiS (T^ '€ 
Inserting the ingredients ( |115| , 117j]118| , |119| ) into expression (116) and rearranging terms shows that the 



linear terms indeed cancel, and that the term involving H a p y s does not contribute (since the elements 
H a p 7 s don't depend on the indices for a ^ (3 and 7 / <5), and we are left with: 



/(m RS , q, q) - f(m RS , q RS , q RS ) 



1 



a(3 2 



4 [1-/3(1-,)]' 



1 



Tr if + -a/3 4 (i?-r) 2 Tr rf 



+ 



a/375 

Because of the index permutation symmetry in the neuron average we can write for a/7 and p 7^ A: 
Gajpx = S a pSjx+5 a x5 7P + G4 [l — S ap ] [l — 5jx] [l—5 a \] [1 — 5 7P ] 

+ G 2 {S ap [1-(5 7 a]+(5 7 a [l-S ap ]+5 a x [1-5 1P ]+5 1P [l-8 a \]} 



with 



,JDz tank (3 [m ■ £ + Zy/ar] cosh" (3 [m ■ £+Zy/ar\. 
G i = { rn i.n /■ I I 7^5 >£ 



/Dz cosh n /3 [m • ^+z v / ar] 

Only terms which involve precisely two (^-functions can contribute, because of the replicon properties 
(HI). As a result: 



/(turs, <?, q) - /(mas, q RS , q RS ) 



1 

/fa 



Tr rf 



1 



a/3 2 



1 



4 [1-/3(1-^ 2 



+ -a(3*{R-r) 



-a 2 f3\R-r) 4 [l-2G 2 + G 4 ] 



+ ... 
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Since TV rj 2 = J2a/3 Vapi the condition for the RS solution to minimise f(m, q, q), if compared to the 
'replicon' fluctuations, is therefore 



+ 2(3 2 {R-r) 2 - a(3 6 (R-r) 4 [1-2G 2 + G 4 ] > 



[l-(3(l-q)] 2 

After taking the limit in the expressions Ge and after evaluating 



lim R = — lim q aa 

n-»0 /?n->(T 



lim 



1 Jdz z 2 e~^-M^] z 
5n/3 j dz e -±z-[i-pq RS ]z 



lim — 



n — 1 



+ 



1 



.1-/3(1-^) 1-/3(1-^+7^) 
and using ( |110|) , the condition (|120|) can be written as 



1 l-/3+20q 
(3[l-(3(l-q)Y 



[l-/3(l-g)] 2 > a(3 2 (jDz cosh" 4 (3 [m • $, + z^F\ )| 



(120) 



(121) 



The AT line in the phase diagram, where this condition ceases to be met, indicates a second-order 
transition to a spin-glass state where ergodicity is broken in the sense that the distribution P(q) (gog) 
is no longer a 5-function. In the paramagnetic regime of the phase diagram, m = and q = 0, the 
AT condition reduces precisely to T > T g = l + ^/a. Therefore the paramagnetic solution is stable. 
The AT line coincides with the boundary between the paramagnetic and spin-glass phase. Numerical 
evaluation of ( |121| ) shows that the RS spin-glass solution remains unstable for all T < T g , but that 
the retrieval solution m ^ is unstable only for very low temperatures T < Tr (see figure |i~2|). 



7 Epilogue 

In this paper I have tried to give a self-contained expose of the main issues, models and mathematical 
techniques relating to the equilibrium statistical mechanical analysis of recurrent neural networks. I 
have included networks of binary neurons and networks of coupled (neural) oscillators, with various 
degrees of synaptic complexity (albeit always fully connected), ranging from uniform synapses, via 
synapses storing a small number of patterns, to Gaussian synapses and synapses encoding an extensive 
number of stored patterns. The latter (complex) cases I only worked out for binary neurons; similar 
calculations can be done for coupled oscillators (see [pi]). Networks of graded response neurons could 
not be included, because these are found never to go to (detailed balance) equilibrium, ruling out 
equilibrium statistical mechanical analysis. All analytical results and predictions have later also been 
confirmed comprehensively by numerical simulations. Over the years we have learned an impressive 
amount about the operation of recurrent networks by thinking in terms of free energies and phase 
transitions, and by having been able to derive explicit analytical solutions (since a good theory always 
supersedes an infinite number of simulation experiments ...). I have given a number of key references 
along the way; many could have been added but were left out for practical reasons. Instead I will just 
mention a number of textbooks in which more science as well as more references to research papers 
can be found. Any such selection is obviously highly subjective, and I wish to apologize beforehand to 
the authors which I regret to have omitted. Several relevant review papers dealing with the statistical 
mechanics of neural networks can be found scattered over the three volumes [2^]. Textbooks 

which attempt to take the interested but non-expert reader towards the expert level are |8f, |2^]. Finally, 
a good introduction to the methods and backgrounds of replica theory, together with a good collection 
of reprints of original papers, can be found in [p^ j. 
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What should we expect for the next decades, in the equilibrium statistical mechanics of recurrent 
neural networks ? Within the confined area of large symmetric and fully connected recurrent networks 
with simple neuron types we can now deal with fairly complicated choices for the synapses, inducing 
complicated energy landscapes with many stable states, but this involves non-trivial and cutting-edge 
mathematical techniques. If our basic driving force next is the aim to bring our models closer to 
biological reality, balancing the need to retain mathematical solvability with the desire to bring in 
more details of the various electro-chemical processes known to occur in neurons and synapses and 
spatio-temporal characteristics of dendrites, the boundaries of what can be done with equilibrium 
statistical mechanics are, roughly speaking, set by the three key issues of (presence or absence of) 
detailed balance, system size, and synaptic interaction range. The first issue is vital: no detailed 
balance immediately implies no equilibrium statistical mechanics. This generally rules out networks 
with non-symmetric synapses and all networks of graded response neurons (even when the latter are 
equipped with symmetric synapses). The issue of system size is slightly less severe; models of networks 
with N < co neurons can often be solved in leading order in N~2, but a price will have to be paid in 
the form of a reduction of our ambition elsewhere (e.g. we might have to restrict ourselves to simpler 
choices of synaptic interactions). Finally, we know how to deal with fully connected models (such as 
those discussed in this paper), and also with models having dendritic structures which cover a long 
(but not infinite) range, provided they vary smoothly with distance. We can also deal with short- 
range dendrites in one-dimensional (and to a lesser extent two dimensional) networks; however, since 
even the relatively simple Ising model (mathematically equivalent to a network of binary neurons 
with uniform synapses connecting only nearest-neighbour neurons) has so far not yet been solved 
in three dimensions, it is not realistic to assume that analytical solution will be possible soon of 
general recurrent neural network models with short range interactions. On balance, although there 
are still many interesting puzzles to keep theorists happy for years to come, and although many of 
the model types discussed in this text will continue to be useful building blocks in explaining at a 
basic and qualitative level the operation of specific recurrent brain regions (such as the CA3 region of 
the hippocampus) , one is therefore led to the conclusion that equilibrium statistical mechanics has by 
now brought us as far as can be expected with regard to increasing our understanding of biological 
neural networks. Dale's law already rules out synaptic symmetry, and thereby equilibrium statistical 
mechanics altogether, so we are forced to turn to dynamical techniques if we wish to improve biological 
realism. 
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Figure 2: The functions f se<i (m)/T (left) and f pai (m)/T (middle) for networks of binary neurons and 
uniform synapses, and for different choices of the re-scaled interaction strength J/T (T = (3~ 1 ). Left 
picture (sequential dynamics): J/T = — |,— |,— ^, ^, |, | (from top to bottom). Middle picture (parallel 
dynamics): J/T = ±|,±|,±i (from top to bottom, here the free energy is independent of the sign 
of J). The right picture gives, for J > 0, the location of the non-negative minimum of f seq (m) and 
fpar(fn) (which is identical to the average activity in thermal equilibrium) as a function of T/J. A 
phase transition to states with non-zero average activity occurs at T/J = 1. 
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Figure 3: Information represented as specific microscopic neuronal firing patterns £ in an N = 841 
Hopfield network and drawn as images in the plane (black pixels: £j = 1, white pixels: £j =— 1). 
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Figure 4: Information processing in a sequential dynamics Hopfield model with ./V = 841, p = 10 and 
T = 0.1, and with the p = 10 stored patterns shown in figure || Left pictures: dynamic reconstruction 
of a stored pattern from an initial state which is a corrupted version thereof. Top left: snapshots of 
the system state at times t = 0,1,2,3,4 iterations/neuron. Bottom left: values of the overlap order 
parameters as functions of time. Right pictures: evolution towards a spurious state from a randomly 
drawn initial state. Top right: snapshots of the microscopic system state at times t = 0,1,2,3,4 
iterations/neuron. Bottom right: values of the overlap order parameters as functions of time. 
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Figure 5: Simulations of a parallel dynamics Hopfield model with N = 30,000 and a = T = 0.1, and with 
random patterns. Left: overlaps m = m\{a) with pattern one as functions of time, following initial 
states correlated with pattern one only, with mi(<r(0)) G {0.1, . . . ,0.9}. Right: corresponding flow in 
the (m,r) plane, with r = a -1 J2u>i m fi( cr ) measuring the overlaps with non-nominated patterns. 
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Figure 6: Distributions of interference noise variables = jj J2fi>i J2j^i£j a ji as measured in 
the simulations of figure [|, at t = 10. Uni-modal histogram: noise distribution following m(0) = 0.9 
(leading to recall). Bi-model histogram: noise distribution following m(0) =0.1 (not leading to recall). 
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Figure 7: Left picture: Amplitudes m n of the mixture states as functions of temperature. From top 
to bottom: n = 1,3,5,7,9,11,13. Solid: region where they are stable (local minima of /). Dashed: 
region where they are unstable. Right picture: corresponding 'free energies' f n . From bottom to top: 
n = 1,3, 5, 7, 9, 11, 13. Dashed line: 'free energy' of the paramagnetic state m = (for comparison). 
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Figure 8: The function f{q)/T (left) for networks of coupled oscillators with uniform synapses Jij = 
J/N, and for different choices of the re-scaled interaction strength J/T (T = I3~ v ): J/T =— |,— 1, 1, | 
(from top to bottom). The right picture gives, for J > 0, the location of the non- negative minimum of 
f(q) (which measures the overall degree of global synchronisation in thermal equilibrium) as a function 
of T/J. A transition to a synchronised state occurs at T/J = \. 
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Figure 9: The function f(m)/T (left) for networks of coupled oscillators with phase patterns stored via 
the synapses Jy = iV -1 cos [£f ~~ £j ]> an d for different choices of (3 = T -1 : (3 = 1, 3, 5, 7 (from top to 
bottom). The right picture gives the location of the non- negative minimum of f(m) (which measures 
the overall degree of global synchronisation with one recalled phase pattern in thermal equilibrium) 
as a function of T. A transition to a recall state occurs at T = \. 
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Figure 10: Phase diagram of the model (|7^) with Gaussian synapses, obtained from the replica- 
symmetric solution. P: paramagnetic phase, m = q = (more or less random evolution). SG: 
spin-glass phase, m = 0, q ^ ('frozen' equilibrium states without pattern recall). F: recall ('ferro- 
magnetic') phase, m / 0, g / 0. Solid lines: second-order transitions. Dashed: the AT instability. 




Figure 11: Left: RS amplitudes m of the pure states of the Hopfield model versus temperature. From 
top to bottom: a = 0.000 — 0.125 (Aa = 0.025). Right, solid lines: 'free energies' / of the pure 
states. From bottom to top: a = 0.000 — 0.125 (Aa = 0.025). Right, dashed lines: 'free energies' of 
the spin-glass state m = (for comparison). From top to bottom: a = 0.000 — 0.125 (Aa = 0.025). 



7 EPILOGUE 



51 




Figure 12: Phase diagram of the Hopfield model. P: paramagnetic phase, m = q = (no recall). SG: 
spin-glass phase, m = 0, q / (no recall). F: pattern recall phase (recall states minimise /), m / 0, 
q 0. M: mixed phase (recall states are local but not global minima of /). Solid lines: separations of 
the above phases (T g : second order, Tm and T c : first order). Dashed: the AT instability for the recall 
solutions (Tr). Inset: close-up of the low temperature region. 





Figure 13: Left: solution of the transcendental equation F(x) 



2a, where x = erf 1 



The 



storage capacity a c ~ 0.138 of the Hopfield model is the largest a for which solutions x / exist. 
Middle picture: RS amplitudes m of the pure states of the Hopfield model for T = as a function of 
a = p/N. The location of the discontinuity, where m vanishes, defines the storage capacity a c ~ 0.138. 
Right picture, solid line: T = 'free energy' / of the pure states. Dashed lines: T = 'free energy' of 
the spin-glass state m = (for comparison). 
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